This API is part of the document's pseudonymization effort lead at Etalab's Lab IA. Other Lab IA projects can be found at the Lab IA.
The purpose of this repo is to provide an API endpoint to the pseudonymize documents. The API should make it easy to developers to automoate document pseudonnymization with their own models.
The larger goal of the pseudonymization project is to help the French Counsil of State open their court decisions to the general public, as required by law. More info about pseudonymization and this project can be found in our French pseudonymization guide here. Our API uses a Named Entity Recognition model to find and replace first names, last names, and addresses in court decisions (specifically those of the Counsil of State).
You need to train a NER model with the Flair library. Unfortunately, currently we cannot share our model nor the data it was trained on as it contains non-public information.
- Natural Language Processing: Information Extraction : Named Entity Recognition
- Python
- Flair
- Flask, gunicorn, nginx
- SQLite
- Pandas
- Docker
The API has two endpoints:
Analyzes a given string
URL : /
Method : POST
Data example All fields must be sent.
{
"text": "M. Pierre Sailly demeurant au 14 rue de la Felicité, 75007 Vienne.",
}
Condition : If everything is OK and the model inference was performed correctly
Code : 200 OK
Content example
{
"success": true,
"pseudo": "M. BK... demeurant au 14 JZ..., 75007 JV...."
}
{'pseudo': 'M. BK... demeurant au 14 JZ..., 75007 JV....', 'success': True}
Analyzes a given string, and returned both a XML-like string with tags, and the pseudonymized text
URL : /tags/
Method : POST
Data example All fields must be sent.
{
"text": "M. Pierre Sailly demeurant au 14 rue de la Felicité, 75007 Vienne.",
}
Condition : If everything is OK and the model inference was performed correctly
Code : 200 OK
Content example
{
"success": true,
"pseudo": "M. BK... demeurant au 14 JZ..., 75007 JV....",
"tags": "<text><sentence><a>M. </a><PER>Pierre Sailly</PER><a> demeurant au 14 </a><LOC>rue de la Felicité</LOC><a>, 75007 </a><LOC>Vienne</LOC><a>.</a></sentence></text>"
}
The easiest way to test this application is by using Docker and Docker Compose.
- Clone this repo (for help see this tutorial).
- Set the environment variable
PSEUDO_MODEL_PATH
in the.env
file. - Launch the wrapper bash file
run_docker.sh
. This file will clean and rebuild the required Docker containers by callingdocker-compose.yml
. - Access the API at
localhost/
andlocalhost/api_stats
.
- Feel free to contact @pedevineau or @psorianom or other Lab IA team members with any questions or if you are interested in contributing!