This project explores and analyzes Netflix data to uncover interesting insights about the platform's content library. The analysis focuses on identifying patterns, trends, and relationships in the dataset, and presenting findings through meaningful visualizations.
- Data Cleaning: Managed missing values, outliers, and inconsistencies to ensure a high-quality dataset.
- Exploratory Data Analysis (EDA): Investigated distributions, relationships, and trends within the data.
- Analytical Questions: Answered 7 critical questions about Netflix content, such as:
- Which genres are most popular?
- How has Netflix's content evolved over time?
- What is the distribution of movie versus TV show content?
- Visualizations: Created charts and graphs using Matplotlib and Seaborn to present insights clearly.
- Source: (https://drive.google.com/file/d/172ZQ3dgpvRDqegAVWjV5d6lR2lIBruSC/view?usp=sharing)
- Description: The dataset includes information about titles, genres, imdb_score, premiere,runtime, and year.
- Programming Language: Python
- Libraries:
Pandas
for data manipulationNumPy
for numerical operationsMatplotlib
andSeaborn
for visualizations
- Tools: Jupyter Notebook
- Some of the key findings:
- Content Trends: Netflix's content production has increased significantly, especially in recent years, reflecting its global growth strategy.
- Genre Popularity: Documentaries lead as the most common genre, followed by Drama and Comedy, showcasing Netflix’s diverse content approach.
- Runtime Trends: Genres like Anthology/Dark Comedy and Heist Films have the longest average runtimes, indicating complex narratives.
- IMDb Scores: IMDb scores have steadily declined since 2015, reaching their lowest in 2021, suggesting challenges in maintaining content quality.