-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
java.lang.NegativeArraySizeException when checking re-identification risk for large file #243
Comments
Thanks for reporting this issue! Can you please post the complete stacktrace? Thanks, Fabian |
Thanks for quick reply. Below is the stacktrace I have when it failed.
|
I have applied a partial fix to master for this issue by enhancing the error message. The dataset that you are trying to load is too large for the current implementation of matrices in ARX. You have two options: (2) Enhance org.deidentifier.arx.framework.data.DataMatrix to switch to a data structure that can hold more than 2^31-1 entries when the number of cells becomes too large. It will not be sufficient to switch to long arrays (they can also not hold more than 2^31-1 entries). You could use multi-dimensional arrays (e.g. a special implementation of an ArrayList that is backed up by a list of arrays, see http://fastutil.di.unimi.it/docs/it/unimi/dsi/fastutil/BigArrays.html) or switch to off-heap memory (https://github.com/xerial/larray is a well-known implementation, however, LArray seems to not be serializable which is required by ARX). Best |
Thanks Fabian for suggestions. How many maximum columns I can use for some meaningful estimates? For now I will choose option 1 and split the file and run the re-id risk on the subset. Thanks |
You can calculate this yourself - the following condition must hold: rows * columns <= 2^31-1. So, if you have 5M rows, this gives you: max_columns = floor( (2^31-1) / 5M) = 429 |
I will close this issue now. Thanks again for reporting the exception. Should you want to extend ARX as described above in the future, we can reopen the issue. Thanks for using ARX! Fabian |
Hi,
We are using ARX Risk analysis and our data files are very large. One of the file is having 5million rows and 800 columns.
When
Data.getHandle()
call is made, it fails because it creates an integer array of size "rows*columns" which goes out of integer range.We can may be use long array, Is that fair to assume and I can submit a pull request?
Thanks
The text was updated successfully, but these errors were encountered: