Skip to content

Commit

Permalink
Updates README.md file and adds The Unlicense license.
Browse files Browse the repository at this point in the history
  • Loading branch information
nunesgh committed May 9, 2022
1 parent 60780b1 commit fcb280d
Show file tree
Hide file tree
Showing 2 changed files with 42 additions and 1 deletion.
25 changes: 25 additions & 0 deletions LICENSE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
This is free and unencumbered software released into the public domain.

Anyone is free to copy, modify, publish, use, compile, sell, or
distribute this software, either in source code form or as a compiled
binary, for any purpose, commercial or non-commercial, and by any
means.

In jurisdictions that recognize copyright laws, the author or authors
of this software dedicate any and all copyright interest in the
software to the public domain. We make this dedication for the benefit
of the public at large and to the detriment of our heirs and
successors. We intend this dedication to be an overt act of
relinquishment in perpetuity of all present and future rights to this
software under copyright law.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.

For more information, please refer to <https://unlicense.org>

18 changes: 17 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,23 @@
# inep-anonymization
# INEP [^inep] (syntactic) Anonymization

Code and attributes hierarchies used for the anonymization process of INEP datasets using [ARX Deidentifier](https://github.com/arx-deidentifier/arx) tool.

DOI: [10.5281/zenodo.6533684](https://doi.org/10.5281/zenodo.6533684).

The resulting datasets were used for vulnerability assessment using the [BVM library](https://github.com/nunesgh/bvm-library) ([10.5281/zenodo.6533704](https://doi.org/10.5281/zenodo.6533704)). The assessment results were published in: Mário S. Alvim, Natasha Fernandes, Annabelle McIver, Carroll Morgan, Gabriel H. Nunes - _Flexible and scalable privacy assessment for very large datasets, with an application to official governmental microdata_ (2022, [10.48550/arXiv.2204.13734](https://doi.org/10.48550/arXiv.2204.13734)).

We randomly selected only one record for each student with a same unique pseudonymization code (`ID_ALUNO`) in each dataset. The enrollment code (`ID_MATRICULA`) for each selected record is available in [10.5281/zenodo.6533675](https://doi.org/10.5281/zenodo.6533675) ([gitlab.com/nunesgh/inep-enrollment-codes](https://gitlab.com/nunesgh/inep-enrollment-codes)).

## ARX version

The `jar` files in `arx/jars/` were compiled from the [ARX fork](https://github.com/ramongonze/arx) made by [@ramongonze](https://github.com/ramongonze), based on commit [8a936d3](https://github.com/ramongonze/arx/commit/8a936d3d5607b8f10957c16c1e2781d94b9f2904) and using the command `ant -buildfile build.xml`.

This fork allows for the creation of matrices with up to (2^31-1)^2 cells, instead of the original limit of up to 2^31-1 cells. Due to some GUI errors caused by the new feature, it is necessary to run ARX via CLI. For more information, see [this issue](https://github.com/arx-deidentifier/arx/pull/299).

## License

[The Unlicense](https://choosealicense.com/licenses/unlicense/).

[^inep]:
The [Anísio Teixeira National Institute of Educational Studies and Research](https://www.gov.br/INEP).

0 comments on commit fcb280d

Please # to comment.