📊 Datasets for Machine Learning Projects

This directory contains various datasets used for machine learning, data analysis, and visualization tasks. Each dataset serves a specific purpose and covers different domains, providing a range of options for exploratory data analysis, regression, classification, and more.

"I haven't uploaded all of the datasets I've used yet :)"

📁 Dataset Descriptions

Advertising.csv • Description: Contains data on advertising expenditures across different channels (TV, Radio, Newspaper) and their impact on sales. • Usage: Great for regression analysis and predictive modeling to analyze the impact of advertising on sales.
SVM_Data.csv • Description: A dataset designed for Support Vector Machine (SVM) classification tasks. • Usage: Suitable for experimenting with SVM algorithms, tuning hyperparameters, and evaluating model accuracy.
Salary.csv • Description: Contains information on individuals' salary levels based on factors like education, experience, and position. • Usage: Ideal for regression modeling, predicting salary levels based on features, and understanding salary trends.
adult.csv • Description: UCI Adult dataset, used to predict income levels based on demographic features. • Usage: Commonly used for classification tasks, especially binary classification on predicting income levels (e.g., above or below 50K).
car_data.csv • Description: Information on various car attributes and evaluations. • Usage: Useful for classification tasks, such as evaluating car acceptability based on features like buying price, maintenance cost, and safety.
insurance.csv • Description: Insurance data, including demographic and medical history factors, along with insurance charges. • Usage: Great for regression analysis, predicting insurance costs, and understanding factors influencing charges.
kc_house_data.csv • Description: Contains house prices and associated features from the King County housing dataset. • Usage: Suitable for regression modeling, predicting house prices, and feature importance analysis.
titanic.csv • Description: Famous Titanic dataset containing details about passengers and their survival status. • Usage: Ideal for binary classification, survival analysis, and feature engineering experiments.
IBM Dataset.csv • Description: IBM HR Analytics dataset, used to predict employee attrition and understand HR factors. • Usage: Useful for classification tasks, especially for predicting employee turnover.
Iris.csv • Description: Classic Iris dataset containing measurements of different iris flower species. • Usage: Commonly used for classification tasks and exploring clustering algorithms.
SampleSuperstore.csv • Description: Data from a sample superstore, including sales, profits, and product categories. • Usage: Useful for analyzing sales performance, regional trends, and market segmentation.
data.csv • Description: General dataset for testing and exploratory analysis. • Usage: Used for various data analysis tasks.
diabetes.csv • Description: Diabetes dataset containing health-related features used to predict diabetes onset. • Usage: Suitable for binary classification and medical analysis tasks.
fifa.csv • Description: FIFA dataset with player statistics. • Usage: Ideal for data exploration, player comparison, and sports analytics.
pendigits-test.csv & pendigits-train.csv • Description: Handwritten digit recognition dataset with training and testing splits. • Usage: Commonly used for classification tasks and testing digit recognition algorithms.
regression_project_data.csv • Description: Dataset for regression analysis. • Usage: Used for regression modeling and feature analysis.
tips.csv • Description: Dataset containing restaurant tips, including features like total bill, tip amount, and day. • Usage: Great for exploring regression, correlation, and understanding factors affecting tips.
titanic.csv (duplicate) • Description: Duplicate of the Titanic dataset, used for different experimental setups. • Usage: Same as above, for binary classification and survival analysis.

📚 How to Use

Load the Data Each dataset is in CSV format and can be loaded using pandas in Python:

python Copy code import pandas as pd

data = pd.read_csv('path/to/dataset.csv') Exploratory Data Analysis Perform initial analysis to understand the features, data types, and any necessary preprocessing steps:

python Copy code data.info() data.describe() Machine Learning These datasets are suitable for a variety of machine learning tasks such as regression, classification, clustering, and feature engineering.

💡 Suggested Applications

Regression Models: Advertising.csv, insurance.csv, kc_house_data.csv Classification Models: adult.csv, car_data.csv, titanic.csv, Iris.csv, diabetes.csv Exploratory Data Analysis: SampleSuperstore.csv, fifa.csv Time-Series & Sequential Data: pendigits-train.csv and pendigits-test.csv

📄 License

These datasets are publicly available for educational and research purposes. Please ensure proper citation when using these datasets for publications or sharing results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📊 Datasets for Machine Learning Projects

"I haven't uploaded all of the datasets I've used yet :)"

📁 Dataset Descriptions

📚 How to Use

💡 Suggested Applications

📄 License

About

Releases

Packages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Advertising.csv		Advertising.csv
Copy of SVM_Data.csv		Copy of SVM_Data.csv
Copy of Salary.csv		Copy of Salary.csv
Copy of adult.csv		Copy of adult.csv
Copy of car_data.csv		Copy of car_data.csv
Copy of insurance.csv		Copy of insurance.csv
Copy of kc_house_data.csv		Copy of kc_house_data.csv
Copy of titanic.csv		Copy of titanic.csv
IBM Dataset.csv		IBM Dataset.csv
Iris.csv		Iris.csv
README.md		README.md
SampleSuperstore.csv		SampleSuperstore.csv
data.csv		data.csv
diabetes.csv		diabetes.csv
fifa.csv		fifa.csv
pendigits-test.csv		pendigits-test.csv
pendigits-train.csv		pendigits-train.csv
regression_project_data.csv		regression_project_data.csv
tips.csv		tips.csv
titanic.csv		titanic.csv

imnotamr/Datasets-Used

Folders and files

Latest commit

History

Repository files navigation

📊 Datasets for Machine Learning Projects

"I haven't uploaded all of the datasets I've used yet :)"

📁 Dataset Descriptions

📚 How to Use

💡 Suggested Applications

📄 License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages