Skip to content

A comprehensive collection of datasets for machine learning and data science projects, covering topics from advertising and sales to health and sports analytics

Notifications You must be signed in to change notification settings

imnotamr/Datasets-Used

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📊 Datasets for Machine Learning Projects

This directory contains various datasets used for machine learning, data analysis, and visualization tasks. Each dataset serves a specific purpose and covers different domains, providing a range of options for exploratory data analysis, regression, classification, and more.

"I haven't uploaded all of the datasets I've used yet :)"

📁 Dataset Descriptions

  1. Advertising.csv • Description: Contains data on advertising expenditures across different channels (TV, Radio, Newspaper) and their impact on sales. • Usage: Great for regression analysis and predictive modeling to analyze the impact of advertising on sales.

  2. SVM_Data.csv • Description: A dataset designed for Support Vector Machine (SVM) classification tasks. • Usage: Suitable for experimenting with SVM algorithms, tuning hyperparameters, and evaluating model accuracy.

  3. Salary.csv • Description: Contains information on individuals' salary levels based on factors like education, experience, and position. • Usage: Ideal for regression modeling, predicting salary levels based on features, and understanding salary trends.

  4. adult.csv • Description: UCI Adult dataset, used to predict income levels based on demographic features. • Usage: Commonly used for classification tasks, especially binary classification on predicting income levels (e.g., above or below 50K).

  5. car_data.csv • Description: Information on various car attributes and evaluations. • Usage: Useful for classification tasks, such as evaluating car acceptability based on features like buying price, maintenance cost, and safety.

  6. insurance.csv • Description: Insurance data, including demographic and medical history factors, along with insurance charges. • Usage: Great for regression analysis, predicting insurance costs, and understanding factors influencing charges.

  7. kc_house_data.csv • Description: Contains house prices and associated features from the King County housing dataset. • Usage: Suitable for regression modeling, predicting house prices, and feature importance analysis.

  8. titanic.csv • Description: Famous Titanic dataset containing details about passengers and their survival status. • Usage: Ideal for binary classification, survival analysis, and feature engineering experiments.

  9. IBM Dataset.csv • Description: IBM HR Analytics dataset, used to predict employee attrition and understand HR factors. • Usage: Useful for classification tasks, especially for predicting employee turnover.

  10. Iris.csv • Description: Classic Iris dataset containing measurements of different iris flower species. • Usage: Commonly used for classification tasks and exploring clustering algorithms.

  11. SampleSuperstore.csv • Description: Data from a sample superstore, including sales, profits, and product categories. • Usage: Useful for analyzing sales performance, regional trends, and market segmentation.

  12. data.csv • Description: General dataset for testing and exploratory analysis. • Usage: Used for various data analysis tasks.

  13. diabetes.csv • Description: Diabetes dataset containing health-related features used to predict diabetes onset. • Usage: Suitable for binary classification and medical analysis tasks.

  14. fifa.csv • Description: FIFA dataset with player statistics. • Usage: Ideal for data exploration, player comparison, and sports analytics.

  15. pendigits-test.csv & pendigits-train.csv • Description: Handwritten digit recognition dataset with training and testing splits. • Usage: Commonly used for classification tasks and testing digit recognition algorithms.

  16. regression_project_data.csv • Description: Dataset for regression analysis. • Usage: Used for regression modeling and feature analysis.

  17. tips.csv • Description: Dataset containing restaurant tips, including features like total bill, tip amount, and day. • Usage: Great for exploring regression, correlation, and understanding factors affecting tips.

  18. titanic.csv (duplicate) • Description: Duplicate of the Titanic dataset, used for different experimental setups. • Usage: Same as above, for binary classification and survival analysis.

📚 How to Use

Load the Data Each dataset is in CSV format and can be loaded using pandas in Python:

python Copy code import pandas as pd

data = pd.read_csv('path/to/dataset.csv') Exploratory Data Analysis Perform initial analysis to understand the features, data types, and any necessary preprocessing steps:

python Copy code data.info() data.describe() Machine Learning These datasets are suitable for a variety of machine learning tasks such as regression, classification, clustering, and feature engineering.

💡 Suggested Applications

Regression Models: Advertising.csv, insurance.csv, kc_house_data.csv Classification Models: adult.csv, car_data.csv, titanic.csv, Iris.csv, diabetes.csv Exploratory Data Analysis: SampleSuperstore.csv, fifa.csv Time-Series & Sequential Data: pendigits-train.csv and pendigits-test.csv

📄 License

These datasets are publicly available for educational and research purposes. Please ensure proper citation when using these datasets for publications or sharing results.

About

A comprehensive collection of datasets for machine learning and data science projects, covering topics from advertising and sales to health and sports analytics

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published