Skip to content

Project to model and create a SQL database for INPA's genetic resources collection, and use ETL in Python to migrate the data from spreadsheets

Notifications You must be signed in to change notification settings

nnbuainain/database_inpa_crg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Creating a database for INPA's Genetic Resources Collection

Developed by Nelson Buainain & Maura Regina

bird_pic

INPA's genetic resources collection holds over 75,000 genetic samples of birds 🦉, fishes 🐡 and herps 🐸 🐍 (Reptiles and Amphibians). It's one of the largest collections of genetic material of Amazonian vertebrates in South America, and constitutes an invaluable heritage for the Brazilian and Amazonian peoples.

The database is currently managed in three excel files, separated by each of three large animal groups. Considering the importance of the material, it is time for the creation of a proper, safe, easy to manage and public-accessible database.

Goals

  • Process the spreadsheets's data, reviewing data entries, correcting errors, etc...

  • Develop the conceptual, logical, and physical models for the data unifying all the material in a single database.

  • Migrate the data to the new database.

  • Make the database publicly accessible for everyone including the scientific and non-scientific communities.

Tools

  • Brmodelo to create the conceptual and logical models.

  • PostgreSQL to create the physical model.

  • Python to develop an ETL program in order to extract, process and load the database from the spreadsheets into the new SQL database.

  • Psycopg adapter to connect to PostgreSQL database in the Python program.

pip install psycopg2
pip install pandas
pip install numpy
pip install openpyxl xlsxwriter xlrd

Current stage

We are currently at the third stage, developing the ETL pipeline to migrate the data to the new database.

  • The bird and herpetology database are already available for consultation in SQL format! The 'Solicita' table which records samples loans is not available yet.

Database Model

  • This is how our new database currently looks like:

model

About

Project to model and create a SQL database for INPA's genetic resources collection, and use ETL in Python to migrate the data from spreadsheets

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages