In this project, we will combine and practice implementing what we have learned throughout this course, including:
- Extracting data from a SQL database
- Cleaning, transforming and analyzing data
- Loading data into a database
- Developing and implementing a QA process to validate transformed data against raw data
README.md file
- This file should be an overview of your project, covering roughly the same points as your presentation
cleaning_data.md file
- Fill out this file with a description of the issues that will be addressed by cleaning the data Include the queries used to clean the data
startingwithquestions.md file
- Provide the answer to the 5 questions and the queries used to answer each question startingwithdata.md file Provide the 3 - 5 new questions you decided could be answered with the data Include the answer to each question and the accompanying queries used to obtain the answer
QA.md file
- Identify and describe your risk areas Develop and execute a QA process to address the risk areas identified, providing the SQL queries used to implement
schema.png file
- This file should contain the ERD of the database
When loading CSV's into pgAdmin4:
- right click your public schema. Click Create > Table
- In the General Tab, name your new table the same as your CSV(i.e. all_sessions)
- In the columns tab, create each column from your CSV file in the exact order it appears
- Set each column data type to an acceptable type (remember to adjust Length/scale to either 255 or 550)
- click save
- navigate to tables on the left hand menu
- right click your new table, and click import/export data
After data is loaded
- conduct sql queries on relevant files and screenshot the results into the project
References:
- https://www.postgresqltutorial.com/postgresql-tutorial/import-csv-file-into-posgresql-table/
- https://www.youtube.com/watch?v=6Jf7eTkIaR4
- dealing with null values, getting rid of them
- Handle missing data - imputing missing values
- address outliers - remove outliers
- data normalization (changing data type)
- duplicate data
- data transformations
- This data could tell me all about customer behaviour and which products are popular in which countres, i used the data to make these queries!
- The data helped me to answer various questions about inventories and tracking
- questions file
- data exploration
- QA file
- had a lot of challenges with importing the data, writing a query to import it effectively at first
- doing joins and connecting queries together
- casting as integers
- more data cleaning and joining with other tables
- making a legend for the dataset and table of values
- maybe doing predictive analytics down the line