This project focuses on predicting the strength of passwords using machine learning techniques. The project is divided into multiple notebooks, each covering a specific aspect of the project. Below is a summary of each notebook:
- Notebook 01: Fetch data
- Notebook 02: Data Exploration
- Notebook 03: Select Base Model
- Notebook 04: Make Pipeline
In this notebook, we will fetch and preprocess the "rockyou.txt" dataset to analyze password strength. This dataset will serve as the foundation for our "Passwordometer" project.
In this notebook, we explore the dataset to gain insights and understanding about the data. It includes visualizations and statistical analysis to identify patterns and trends in the password data.
In this notebook, we will clean the password dataset obtained in the previous notebook. We will remove invalid passwords and perform basic data cleaning steps to ensure the quality and integrity of the data.
In this notebook, we will create a stratified sample of the clean password dataset obtained in the previous notebook. The stratified sample will ensure that we have representative samples from different password strength levels, allowing us to perform accurate analysis and modeling.
In this notebook, we will perform feature engineering on the stratified sample data obtained in the previous notebook. Feature engineering involves creating new meaningful features from the existing data that can improve the performance of our password strength prediction model.
In this notebook, we will perform descriptive analysis on the transformed sample data obtained in the previous notebook. Descriptive analysis involves exploring and summarizing the data to gain insights into the distribution, relationships, and patterns of the variables.
In this notebook, we will select a base model for password strength prediction. We will evaluate various regression models using performance metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), R-squared (R2), and the training time (TT). The model with the best performance will be chosen as the base model for further improvement.
In this notebook, we will create a machine learning pipeline using scikit-learn. The pipeline will include data preprocessing steps and a decision tree regressor as the base model. The pipeline will be trained and evaluated on the password strength prediction task.
Feel free to explore each notebook in order to gain a comprehensive understanding of the project and the steps involved in password strength prediction.
For more details and code implementation, please refer to the respective notebooks.
Note: The notebooks are organized in a sequential manner to provide a logical flow of the project. It is recommended to follow the notebooks in the given order to fully grasp the concepts and reproduce the results.