This directory contains various datasets used for machine learning, data analysis, and visualization tasks. Each dataset serves a specific purpose and covers different domains, providing a range of options for exploratory data analysis, regression, classification, and more.
-
Advertising.csv • Description: Contains data on advertising expenditures across different channels (TV, Radio, Newspaper) and their impact on sales. • Usage: Great for regression analysis and predictive modeling to analyze the impact of advertising on sales.
-
SVM_Data.csv • Description: A dataset designed for Support Vector Machine (SVM) classification tasks. • Usage: Suitable for experimenting with SVM algorithms, tuning hyperparameters, and evaluating model accuracy.
-
Salary.csv • Description: Contains information on individuals' salary levels based on factors like education, experience, and position. • Usage: Ideal for regression modeling, predicting salary levels based on features, and understanding salary trends.
-
adult.csv • Description: UCI Adult dataset, used to predict income levels based on demographic features. • Usage: Commonly used for classification tasks, especially binary classification on predicting income levels (e.g., above or below 50K).
-
car_data.csv • Description: Information on various car attributes and evaluations. • Usage: Useful for classification tasks, such as evaluating car acceptability based on features like buying price, maintenance cost, and safety.
-
insurance.csv • Description: Insurance data, including demographic and medical history factors, along with insurance charges. • Usage: Great for regression analysis, predicting insurance costs, and understanding factors influencing charges.
-
kc_house_data.csv • Description: Contains house prices and associated features from the King County housing dataset. • Usage: Suitable for regression modeling, predicting house prices, and feature importance analysis.
-
titanic.csv • Description: Famous Titanic dataset containing details about passengers and their survival status. • Usage: Ideal for binary classification, survival analysis, and feature engineering experiments.
-
IBM Dataset.csv • Description: IBM HR Analytics dataset, used to predict employee attrition and understand HR factors. • Usage: Useful for classification tasks, especially for predicting employee turnover.
-
Iris.csv • Description: Classic Iris dataset containing measurements of different iris flower species. • Usage: Commonly used for classification tasks and exploring clustering algorithms.
-
SampleSuperstore.csv • Description: Data from a sample superstore, including sales, profits, and product categories. • Usage: Useful for analyzing sales performance, regional trends, and market segmentation.
-
data.csv • Description: General dataset for testing and exploratory analysis. • Usage: Used for various data analysis tasks.
-
diabetes.csv • Description: Diabetes dataset containing health-related features used to predict diabetes onset. • Usage: Suitable for binary classification and medical analysis tasks.
-
fifa.csv • Description: FIFA dataset with player statistics. • Usage: Ideal for data exploration, player comparison, and sports analytics.
-
pendigits-test.csv & pendigits-train.csv • Description: Handwritten digit recognition dataset with training and testing splits. • Usage: Commonly used for classification tasks and testing digit recognition algorithms.
-
regression_project_data.csv • Description: Dataset for regression analysis. • Usage: Used for regression modeling and feature analysis.
-
tips.csv • Description: Dataset containing restaurant tips, including features like total bill, tip amount, and day. • Usage: Great for exploring regression, correlation, and understanding factors affecting tips.
-
titanic.csv (duplicate) • Description: Duplicate of the Titanic dataset, used for different experimental setups. • Usage: Same as above, for binary classification and survival analysis.
Load the Data Each dataset is in CSV format and can be loaded using pandas in Python:
python Copy code import pandas as pd
data = pd.read_csv('path/to/dataset.csv') Exploratory Data Analysis Perform initial analysis to understand the features, data types, and any necessary preprocessing steps:
python Copy code data.info() data.describe() Machine Learning These datasets are suitable for a variety of machine learning tasks such as regression, classification, clustering, and feature engineering.
Regression Models: Advertising.csv, insurance.csv, kc_house_data.csv Classification Models: adult.csv, car_data.csv, titanic.csv, Iris.csv, diabetes.csv Exploratory Data Analysis: SampleSuperstore.csv, fifa.csv Time-Series & Sequential Data: pendigits-train.csv and pendigits-test.csv
These datasets are publicly available for educational and research purposes. Please ensure proper citation when using these datasets for publications or sharing results.