-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Enables org.deidentifier.arx.framework.data.DataMatrix to support (2^31-1)^2 cells #299
Conversation
…ad of throwing an IllegalArgumentException, an multidimensional array is created. Now the upper bound of cells for a DataMatrix is (2^31)^2.
….multiplyExact in DataMatrix constructor. Now the exception is handled.
In the last 2 commits I have just bring back the try-catch block to handle with ArithmeticException: integer overflow. I have also ran tests using a dataset with 56 million rows and 120 columns, and they worked perfectly with multidimensional matrices. |
Hi ramongonze, thank you for your interest in ARX and this great contribution! This is a huge change at the core of the software. We will therefore need some time to ensure that it works, doesn't break anything and can be merged. A few initial questions: (1) Have you tested this, by executing the various functionalities of the software and made sure that it doesn't run into exceptions in other places? There might well be other parts of the code that need to be changed in order to handle such large datasets. Thanks again! |
…aMatrix class to new classes: SingleArrayMatrix and MultidimensionalArrayMatrix; Created a new abstract class 'Matrix' to keep all information for a single array or a multidimensional one; Runned JUnit tests using single array and multidimensional array, and both have passed in all tests.
I changed the array variable from DataMatrix to an Matrix object. I created 3 new classes in org.deidentifier.arx.framework.data:
Matrix class has all methods from DataMatrix which had dependency on array variable. Why didn't I turned DataMatrix into an abstract class? Because I avoided to change classes from other packages, so the changes kept only in DataMatrix class. That's the reason I created the Matrix class. Summarizing, we have:
I have ran all JUnit tests using both SingleArrayMatrix and MultidimensionalArrayMatrix operations, and all tests were ok (excepted those ones which depends on datasets that are not available in this repo). What other parts of the code should I change to handle large datasets? |
Thanks a lot! Another test that you should always perform is to load and save and load again the example projects with the GUI. Does it work? Thanks, Fabian |
Before testing GUI and saving/load projects, I would like to fix the problem of issue #302, because it might influence the test. Just an update. |
Conflicts: build.xml
We will update all GUI-related dependencies as part of fixing issue #315. This will likely fix this as well. |
I have done a test to check the new matrix implementation ( Created a dataset with 2,200,000 rows and 1000 columns (the matrix has more than 2^31-1 cells). Test Steps
It runned without errors and problems. @prasser is there a specific test at |
Sounds great, thanks! Please perform the following additional test: Does this work? |
@prasser Test Steps
When opening the arx (11th step), all the configuration set before were ok. OBS: |
Ok, great. I created a branch "morecells" and changed the base branch of this PR, so that I can take a look. Thanks! |
I will now check the branch and report any issues here. |
@ramongonze Unfortunately, your branch doesn't work. Your experiments worked, because you did not generate an anonymized output dataset within your project. Try the following steps: (1) With ARX's master: open the example project file or generate a new project in which an anonymization has been performed resulting in an output dataset. The reason is that DataMatrix is a serializable class in ARX and you have changed it, so that already serialized instances are not loaded fully. I see at least two options: (1) You just extend DataMatrix with the new functionality, but don't remove any of the existing fields. You will then need to implement some logic to decide whether or not an old or new instance of the class has been loaded. If you have any questions, please don't hesitate to ask. |
As suggested in #243 (comment) , it is a simple implementation of an int 2D matrix.
It allows to create matrices with at most (2^31-1)^2 cells.
It keeps the original implementation of an array to be used when there are less than 2^31-1 cells.
If the number of cells is higher than 2^31-1, it creates an int 2D matrix.
Basically, it changes all methods to make calculations and return values according to DataMatrix nature (multidimensional or not).