Pseudo API

This API is part of the document's pseudonymization effort lead at Etalab's Lab IA. Other Lab IA projects can be found at the Lab IA.

Project Status: [Active]

Intro/Objectives

The purpose of this repo is to provide an API endpoint to the pseudonymize documents. The API should make it easy to developers to automoate document pseudonnymization with their own models.

The larger goal of the pseudonymization project is to help the French Counsil of State open their court decisions to the general public, as required by law. More info about pseudonymization and this project can be found in our French pseudonymization guide here. Our API uses a Named Entity Recognition model to find and replace first names, last names, and addresses in court decisions (specifically those of the Counsil of State).

You need to train a NER model with the Flair library. Unfortunately, currently we cannot share our model nor the data it was trained on as it contains non-public information.

Methods Used

Natural Language Processing: Information Extraction : Named Entity Recognition

Technologies

Python
Flair
Flask, gunicorn, nginx
SQLite
Pandas
Docker

API Description

The API has two endpoints:

1. Pseudonymization

Analyzes a given string

URL : /

Method : POST

Data example All fields must be sent.

{
    "text": "M. Pierre Sailly demeurant au 14 rue de la Felicité, 75007 Vienne.",
}

Success Response

Condition : If everything is OK and the model inference was performed correctly

Code : 200 OK

Content example

{
    "success": true,
    "pseudo": "M. BK... demeurant au 14 JZ..., 75007 JV...."
}

{'pseudo': 'M. BK... demeurant au 14 JZ..., 75007 JV....', 'success': True}

2. Tag and pseudonymize

Analyzes a given string, and returned both a XML-like string with tags, and the pseudonymized text

URL : /tags/

Method : POST

Data example All fields must be sent.

{
    "text": "M. Pierre Sailly demeurant au 14 rue de la Felicité, 75007 Vienne.",
}

Success Response

Condition : If everything is OK and the model inference was performed correctly

Code : 200 OK

Content example

{
    "success": true,
    "pseudo": "M. BK... demeurant au 14 JZ..., 75007 JV....",
    "tags": "<text><sentence><a>M. </a><PER>Pierre Sailly</PER><a> demeurant au 14 </a><LOC>rue de la Felicité</LOC><a>, 75007 </a><LOC>Vienne</LOC><a>.</a></sentence></text>"
}

Getting Started

The easiest way to test this application is by using Docker and Docker Compose.

Clone this repo (for help see this tutorial).
Set the environment variable PSEUDO_MODEL_PATH in the .env file.
Launch the wrapper bash file run_docker.sh. This file will clean and rebuild the required Docker containers by calling docker-compose.yml.
Access the API at localhost/ and localhost/api_stats.

Project Deliverables

Contact

Feel free to contact @pedevineau or @psorianom or other Lab IA team members with any questions or if you are interested in contributing!

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
nginx		nginx
pseudo_api		pseudo_api
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
run_docker.sh		run_docker.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pseudo API

Project Status: [Active]

Intro/Objectives

Methods Used

Technologies

API Description

1. Pseudonymization

Success Response

2. Tag and pseudonymize

Success Response

Getting Started

Project Deliverables

Contact

About

Releases

Packages

Languages

License

etalab-ia/pseudo_api

Folders and files

Latest commit

History

Repository files navigation

Pseudo API

Project Status: [Active]

Intro/Objectives

Methods Used

Technologies

API Description

1. Pseudonymization

Success Response

2. Tag and pseudonymize

Success Response

Getting Started

Project Deliverables

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages