Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Exclude suppressed records when analyzing the distribution of output data #365

Closed
srcds opened this issue Nov 26, 2021 · 5 comments
Closed
Assignees

Comments

@srcds
Copy link
Contributor

srcds commented Nov 26, 2021

In “Analyze utility” > “Distribution” it would be valuable to have an option to exclude suppressed records when analyzing the distribution of the output dataset.

@prasser prasser assigned srcds and unassigned prasser Nov 27, 2021
@prasser
Copy link
Collaborator

prasser commented Nov 27, 2021

Assigned to @srcds as this is a good oppportunity to better get to know ARX.

@srcds
Copy link
Contributor Author

srcds commented Dec 7, 2021

Example:

  1. Start ARX and open the "example.deid" project provided as an example in the "data" folder (https://github.com/arx-deidentifier/arx/blob/master/data/example.deid)
  2. Anonymize the dataset (Edit -> Anonymize -> OK)
  3. Navigate to the Analyze utility view (View -> Analyze Utility)
  4. Click on the column of the "age" attribute
  5. Click on the Distribution tab in the lower part of the view

ARX should look at follows:
Distribution_suppressed_records

During the anonymization some of the records were removed from the dataset. In the Distribution they are shown in a group labeled as "*" (Area 'A' in the figure). It would be nice to have an option to exclude all removed records from the distribution view. In another example it becomes apparent why:

grafik

Depending on whether the removed records are only removed from the distribution view or from the entire output data (open point for discussion), a button for toggling this new feature could be located in Area B or Area C.

@prasser prasser assigned srcds and unassigned srcds Dec 7, 2021
@idhamari
Copy link
Contributor

idhamari commented Dec 15, 2021

The feature is added in this commit . Note: the variable names are modified for better readability.

The task has two parts: handling the gui, and handling the frequency

Adding a check box button:
The class LayoutUtility creates two instances of LayoutUtilityStatistics one for the left panel (input data ) and one for the left panel (output results): :

    statisticsInputLayout = new LayoutUtilityStatistics(bottomLeft,
                                             controller,
                                             ModelPart.INPUT,
                                             null);
    statisticsOutputLayout = new LayoutUtilityStatistics(bottomRight,
                                              controller,
                                              ModelPart.OUTPUT,
                                              ModelPart.INPUT);

In LayoutUtilityStatistics, one can use the following if to customize a specific panel left ot right:

         if (target == ModelPart.INPUT) {            
            // customize  left panel
          }

To add a check button, we add swt.widgets.ToolItem for the right panel in the ComponentTitledFolderButtonBar

ComponentTitledFolderButtonBar toolbarVis  = new ComponentTitledFolderButtonBar("id-50", helpids); //$NON-NLS-1$
private  ToolItem                                   chkbtnSuppressedRecords;  
if (!(target == ModelPart.INPUT)) {            
        toolbarVis.add(chkbtnSuppressedRecordsLabel, icnSRDisabled, true,  new Runnable()  {
                    @Override public void run() {
                           toggleChkbtnSuppressedRecords();
                           toggleChkbtnSRIcon(); 
        }//run
     }//Runnable
   );//add
 }//if

Then we add an action, here we have two functions inside run above update the frequency by disabling the visualisation, calling update of the histogram distHist and the table distTbl, the enable the visualisation again (probably there is a better way to do this by updating the view directly).

/**
 * Toggle suppressed records 
 */
private void toggleChkbtnSuppressedRecords() {
    if (this.chkbtnVisualisation.getSelection()) {
        this.hideSuppressedRecords = this.chkbtnSuppressedRecords.getSelection();

        this.model.setVisualizationEnabled(false);
        this.controller.update(new ModelEvent(this, ModelPart.SELECTED_UTILITY_VISUALIZATION, false));
        
        this.distHist.update(new ModelEvent(this, ModelPart.SELECTED_UTILITY_VISUALIZATION, false),this.hideSuppressedRecords);
        this.distTbl.update(new ModelEvent(this, ModelPart.SELECTED_UTILITY_VISUALIZATION, false),this.hideSuppressedRecords);
            
        this.model.setVisualizationEnabled(true);
        this.controller.update(new ModelEvent(this, ModelPart.SELECTED_UTILITY_VISUALIZATION, true));
    }
}

and change the icon and the tooltip message:

/**
 * Toggle check button image.
 */
private void toggleChkbtnSRIcon(){
    if (!this.chkbtnSuppressedRecords.getSelection()) {
        this.chkbtnSuppressedRecords.setImage(icnSRDisabled);
        this.chkbtnSuppressedRecords.setToolTipText("Hide suppressed records!");
    } else {
        this.chkbtnSuppressedRecords.setImage(icnSREnabled);
        this.chkbtnSuppressedRecords.setToolTipText("View suppressed records!");
    }

}

Modifying the frequency to hide suppressed records

In ViewStatisticsDistributionHistogram class, the function run calls getFrequencyDistribution which calls another getFrequencyDistribution that does the computation.

For now, the current solution is to handle the values and the frequency before using them in the chart e.g. in function onFinish:

            ArrayList <String> ar2 = new ArrayList<String>();
            for (int i=0; i<distribution.values.length;i++) {
              ar2.add(distribution.values[i]);
            }
            if (hideSuppressedRecords) {
                ar2.clear();
                for (int i=0; i<distribution.values.length;i++) {
                    if ( (distribution.values[i]!="*") && (true) ) {                         
                        ar2.add(distribution.values[i]);
                    }
                }                    
             }
            
            String [] newDistValues2  =  new String [ar2.size()] ;
            double [] newDistFreqs2   =  new double[ar2.size()] ;
            for (int i=0; i<ar2.size();i++) {
                newDistValues2[i] = ar2.get(i);
                newDistFreqs2[i] = distribution.frequency[i];                    
              }

then use the new value in the chart

            series.setYSeries(newDistFreqs2 );
            xAxis.setCategorySeries(newDistValues2);

The variable hideSuppressedRecords and a new function update should be added to the parent class ViewStatistics then overridden in ViewStatisticsDistributionHistogram to get the action from the check button in LayoutUtilityStatistics

/**
 * View/Hide suppressed records 
 */
@Override
public void update(ModelEvent event, Boolean hsr) {
    hideSuppressedRecords = hsr;
}

Update:

It seems the class AnalysisContextDistribution handles the data so a suggested solution is to implement a new function to modify the distribution values and frequencies into two new AnalysisContextDistribution members e.g.

public void hideSuppressedData(  StatisticsFrequencyDistribution distribution) throws InterruptedException 

Then modify ViewStatisticsDistributionHistogram to use the new values e.g.

          if (hideSuppressedRecords) {
              try{
                  context.hideSuppressedData(distribution);
              } catch (InterruptedException e) {
                e.printStackTrace();
              }
          }

then ...

            if (!hideSuppressedRecords) {
                series.setYSeries(this.distribution.frequency);
             } else {
                 series.setYSeries(context.newDistFreqs );                  
             }   

and finally ,,,

            if (!hideSuppressedRecords) {
               xAxis.setCategorySeries(this.distribution.values);
            } else {
               xAxis.setCategorySeries(context.newDistValues);                   
            }

The code above includes repetition and does not look nice but it works.

@idhamari
Copy link
Contributor

The suggested solution is in this commmit

@idhamari
Copy link
Contributor

This issue is solved by this PR.

@prasser prasser closed this as completed Jun 19, 2022
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

3 participants