Chad Dubiel, David Martinez, Katy Fuentes
Correlation between cryptocurrency # and Covid case counts.
https://github.com/cdubiel08/ETL-Project-Group-9
- API Key - SFOX https://www.sfox.com/developers/?python#market-data or Gemini https://docs.gemini.com/websocket-api/#market-data
- Covid - https://www.kaggle.com/imdevskp/corona-virus-report
- Cryptocurrency Historical Chart - https://www.kaggle.com/mczielinski/bitcoin-historical-data
- What useful investigation could be done with the final database? Use the output and compare to markets, commodities, or US dollar.
- Whether final database will be relational or non-relational. Why? Relational because the information will be interconnected based on a timeframe.
Dates not a good join method, need a unique ID for primary key
- Pandas - for data formatting, date cleaning, reduce columns
- Mongo - better for skipping null values which would skip data column, any covid/crypto overlaps captured
- At least 2 (or more) sources
- If possible, try to incorporate a web API as one of your data sources.
- Within Jupyter, build out the ETL process to extract your data from their sources, apply some level of transformation, and load the resulting data to a database (relational or non-relational)
- Build a Flask application that has a route that will execute a query to your database and return the results in JSON format.
- Write up a short report that details your 3 ETL steps.
- More details on a later slide.
- Store all of your project files in a well-organized project repository
- Each member of your team will submit a link to your project repo to BCS by the end of class Tuesday
- What data sources you chose and why?
- Detailing the process of the extraction, transformation, and loading steps
- Explain why you have performed the types of transformation you did
- Why you chose the type of final database
- Schema of the tables/collections in the final database
- Hypothetical use case(s) for your database