From fcb280d5455b625cd93fce208ec3712e6cc6a2fb Mon Sep 17 00:00:00 2001 From: nunesgh Date: Mon, 9 May 2022 18:24:46 -0300 Subject: [PATCH] Updates README.md file and adds The Unlicense license. --- LICENSE.md | 25 +++++++++++++++++++++++++ README.md | 18 +++++++++++++++++- 2 files changed, 42 insertions(+), 1 deletion(-) create mode 100644 LICENSE.md diff --git a/LICENSE.md b/LICENSE.md new file mode 100644 index 0000000..1e9858b --- /dev/null +++ b/LICENSE.md @@ -0,0 +1,25 @@ +This is free and unencumbered software released into the public domain. + +Anyone is free to copy, modify, publish, use, compile, sell, or +distribute this software, either in source code form or as a compiled +binary, for any purpose, commercial or non-commercial, and by any +means. + +In jurisdictions that recognize copyright laws, the author or authors +of this software dedicate any and all copyright interest in the +software to the public domain. We make this dedication for the benefit +of the public at large and to the detriment of our heirs and +successors. We intend this dedication to be an overt act of +relinquishment in perpetuity of all present and future rights to this +software under copyright law. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, +EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF +MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. +IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR +OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, +ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR +OTHER DEALINGS IN THE SOFTWARE. + +For more information, please refer to + diff --git a/README.md b/README.md index 3b36d4d..8dec606 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,23 @@ -# inep-anonymization +# INEP [^inep] (syntactic) Anonymization + +Code and attributes hierarchies used for the anonymization process of INEP datasets using [ARX Deidentifier](https://github.com/arx-deidentifier/arx) tool. + +DOI: [10.5281/zenodo.6533684](https://doi.org/10.5281/zenodo.6533684). + +The resulting datasets were used for vulnerability assessment using the [BVM library](https://github.com/nunesgh/bvm-library) ([10.5281/zenodo.6533704](https://doi.org/10.5281/zenodo.6533704)). The assessment results were published in: Mário S. Alvim, Natasha Fernandes, Annabelle McIver, Carroll Morgan, Gabriel H. Nunes - _Flexible and scalable privacy assessment for very large datasets, with an application to official governmental microdata_ (2022, [10.48550/arXiv.2204.13734](https://doi.org/10.48550/arXiv.2204.13734)). + +We randomly selected only one record for each student with a same unique pseudonymization code (`ID_ALUNO`) in each dataset. The enrollment code (`ID_MATRICULA`) for each selected record is available in [10.5281/zenodo.6533675](https://doi.org/10.5281/zenodo.6533675) ([gitlab.com/nunesgh/inep-enrollment-codes](https://gitlab.com/nunesgh/inep-enrollment-codes)). ## ARX version The `jar` files in `arx/jars/` were compiled from the [ARX fork](https://github.com/ramongonze/arx) made by [@ramongonze](https://github.com/ramongonze), based on commit [8a936d3](https://github.com/ramongonze/arx/commit/8a936d3d5607b8f10957c16c1e2781d94b9f2904) and using the command `ant -buildfile build.xml`. This fork allows for the creation of matrices with up to (2^31-1)^2 cells, instead of the original limit of up to 2^31-1 cells. Due to some GUI errors caused by the new feature, it is necessary to run ARX via CLI. For more information, see [this issue](https://github.com/arx-deidentifier/arx/pull/299). + +## License + +[The Unlicense](https://choosealicense.com/licenses/unlicense/). + +[^inep]: + The [Anísio Teixeira National Institute of Educational Studies and Research](https://www.gov.br/INEP). +