Developed by Nelson Buainain & Maura Regina
INPA's genetic resources collection holds over 75,000 genetic samples of birds 🦉, fishes 🐡 and herps 🐸 🐍 (Reptiles and Amphibians). It's one of the largest collections of genetic material of Amazonian vertebrates in South America, and constitutes an invaluable heritage for the Brazilian and Amazonian peoples.
The database is currently managed in three excel files, separated by each of three large animal groups. Considering the importance of the material, it is time for the creation of a proper, safe, easy to manage and public-accessible database.
-
Process the spreadsheets's data, reviewing data entries, correcting errors, etc...
-
Develop the conceptual, logical, and physical models for the data unifying all the material in a single database.
-
Migrate the data to the new database.
-
Make the database publicly accessible for everyone including the scientific and non-scientific communities.
-
Brmodelo to create the conceptual and logical models.
-
PostgreSQL to create the physical model.
-
Python to develop an ETL program in order to extract, process and load the database from the spreadsheets into the new SQL database.
-
Psycopg adapter to connect to PostgreSQL database in the Python program.
pip install psycopg2
pip install pandas
pip install numpy
pip install openpyxl xlsxwriter xlrd
We are currently at the third stage, developing the ETL pipeline to migrate the data to the new database.
- The bird and herpetology database are already available for consultation in SQL format! The 'Solicita' table which records samples loans is not available yet.
- This is how our new database currently looks like: