Obesity Classification AI Project

Introduction

This project classifies individuals' obesity levels based on various health and lifestyle-related attributes, such as age, gender, height, weight, and BMI (Body Mass Index). By utilizing machine learning algorithms, this project aims to predict obesity levels to aid in understanding potential health risks associated with obesity.

Dataset

The dataset used in this project comes from the Obesity Classification Dataset. It includes attributes related to personal characteristics and health status.

Columns

ID: Unique identifier for each individual
Age: Age in years
Gender: Male or Female
Height: Height in centimeters
Weight: Weight in kilograms
BMI: Body Mass Index
Label: Obesity classification (e.g., Underweight, Normal Weight, Overweight, Obese)

Data Preprocessing

Data preprocessing steps include:

Handling Missing Values: Ensuring no missing entries in the dataset.
Encoding Categorical Variables: Converting categorical features into numerical formats.
Normalizing Numerical Features: Standardizing numerical values to improve model performance.
Splitting the Dataset: Dividing data into training and testing sets for model validation.

Exploratory Data Analysis (EDA)

EDA was performed to understand feature distributions and relationships with the target label:

Visualizations: Histograms, box plots, and correlation matrices were used to explore the data.
Summary Statistics: Mean, median, and distribution checks were conducted.

Machine Learning Algorithms

The following models were evaluated for obesity classification:

Linear Support Vector Classifier (SVC): Efficient for linearly separable classes, providing quick results for binary or multiclass classification.
K-Nearest Neighbors (KNN): A simple instance-based classifier, suitable for smaller datasets and capturing local data patterns.
Random Forest Classifier: An ensemble approach that reduces overfitting, effectively handling complex relationships.
HistGradientBoosting Classifier: A sequential boosting model that refines errors, often outperforming simpler classifiers in complex cases.

Random Forest Classifer and HistGradientBoosting Classifier's performance was optimized using hyperparameter tuning to achieve the best results for this dataset.

Model Evaluation

Evaluation metrics include:

Accuracy: The primary metric for overall correctness.
Precision, Recall, and F1-Score: Used to understand model performance on each class.
Confusion Matrix: Provides a detailed view of classification performance across obesity classes.

Usage

To replicate the analysis and model training:

Clone the repository:

git clone https://github.com/otuemre/obesity-classification.git

Navigate to the project directory:
```
cd obesity-classification
```
Install the required dependencies:
```
pip install -r requirements.txt
```

Run the Jupyter Notebook:

jupyter notebook notebooks/obesity-classification.ipynb

Project Structure

data/: Folder containing the dataset.
notebooks/: Contains Jupyter Notebook(s) for data analysis, feature engineering, and model training.
images/: Folder containing visualization graphs generated during data analysis.
models/: Folder where final models are saved as .pkl and .joblib files.
README.md: Project documentation.
LICENSE.md: License information.

Dependencies

This project relies on the following Python libraries:

NumPy: For numerical operations and array handling.
Pandas: For data manipulation and analysis.
Matplotlib and Seaborn: For creating visualizations and plots.
Scikit-Learn: For implementing machine learning algorithms.
Joblib: For saving and loading model files.
Pickle: For saving and loading model files.

Contributing

Contributions are welcome! Please fork the repository and submit a pull request for any enhancements or bug fixes.

License

This project is licensed under the MIT License. See the LICENSE.md file for details.

Note: This project uses the Obesity Classification Dataset from Kaggle. Ensure compliance with the dataset's license and terms of use: Kaggle Dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Obesity Classification AI Project

Introduction

Table of Contents

Dataset

Columns

Data Preprocessing

Exploratory Data Analysis (EDA)

Machine Learning Algorithms

Model Evaluation

Usage

Project Structure

Dependencies

Contributing

License

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
data		data
images		images
models		models
notebooks		notebooks
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
requirements.txt		requirements.txt

License

otuemre/obesity-classification

Folders and files

Latest commit

History

Repository files navigation

Obesity Classification AI Project

Introduction

Table of Contents

Dataset

Columns

Data Preprocessing

Exploratory Data Analysis (EDA)

Machine Learning Algorithms

Model Evaluation

Usage

Project Structure

Dependencies

Contributing

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages