This is project is an attempt to improve what is out there about Friends in terms of data. Here you'll find a way to explore all the data available about Friends, either using Pandas or plan SQL.
You can execute docker-compose up app
and then access the JupyterLab through the link shown in your command-line interface. With this approach, you can use Pandas as you will.
If you only want to execute SQL, just run docker-compose up builder
and wait until it's finished. Then you can open your favorite SQL browser and connect to the PostgreSQL database with the following data:
- URL: jdbc:postgresql://localhost:5432/postgres
- User: postgres
About the entities:
I'm following the Delta Architecture design pattern but I changed it a bit to fit this small project. So here you'll find the following layers:
- Raw layer: As the name suggests, you'll find the raw data without any processing. Although, in real projects, it may contain profiling of all attributes, scoring the data in terms of its adherence to domain business and its typing, governance (like data catalog and many more), and security.
- Integration layer: The data is organized, and a clear pattern can be noticed. In other words, you can query the data through well-organized tables. They may have relationships that reflect how it is in the real world, but there are no KPIs created from it. It means the data is queryable and ready for insights but without business rules. As always, governance and security play a role here too.
- Business layer: The KPIs can be found here, and it's a layer where a user without expertise in query can understand data easily. Again, governance and security are involved to guarantee many aspects of each domain.
- jvns/pandas-cookbook
- 10 minutes to pandas
- Cookbook
- Select using sub-query in Pandas
- The Pandas DataFrame: Make Working With Data Delightful
- NEVER grow a DataFrame!
- 15 ways to create a Pandas DataFrame
- Python and Parquet Performance
Querying and plot: