Thank you for reviewing this project!!
The goal is to to analyze the dataset of moviliens by designing an effective graph structure and data pipeline.
-
An effective graph structure out of the dataset The graph was build using the Arrows tool for Neo4j graph databases
-
The design of a data pipeline to ingest the data into the graph database
-
An API to retrieve individual node in the graph as well as functionality to search the graph and retrieve the results
Bonus:
-
A unitest to validate that a dataset of movies was loaded completely
Defined as parameters in order to allow flexibility without affecting the code.
- The configuration file can be found here
- The code to read the configuration can be found in the function get_config()
Built in the class Moviliens_Consumer which contains 3 functions:
- init to initialize the class components
- create_constraints() to build the constraints required before loading data into the graph database
- create_from_dataset to consume the dataset from the csv files
- The main function to consume the data can be found here
- The graph database is implemented in Neo4j
- The code to consume the data is written in Python 3
- The repository is built on Github
- The container is built using Docker
author: Lucía Vargas