Diabetes Classification with XGBoost

This project is the implementation of a classifier for diabetes, which uses the XGBoost library to train the model.
After training, the model should decide whether a person has diabetes disease or not.

Diabetes Dataset

The dataset includes more than 70000 records that have been collected from patients.
Dataset has 22 columns:
Diabetes_binary, HighBP, High Cholesterol, Cholesterol Check, BMI, Smoker, Stroke, HeartDiseaseorAttack, Physical Activity, Fruits, Veggies, Heavy Alcohol Consumption, Any Health Care, No Doctor because of Cost, General Health, Mental Health, Physical Health, Difficulty Walking, Sex, Age, Education, Income.

XGBoost

XGBoost is a decision-tree-based ensemble Machine Learning algorithm that uses a gradient-boosting framework. In prediction problems involving unstructured data (images, text, etc.), artificial neural networks tend to outperform all other algorithms or frameworks. However, when it comes to small-to-medium structured/tabular data, decision tree-based algorithms are considered best-in-class right now.

Project has 6 steps:

Import libraries
Getting the data
Preprocessing: load dataset, rename column names, handle Null values, normalize, and convert categorical features to numerical features with OneHotEncoding and Min-Max.
Build XGBoost classifier model: create a XGBClassifier, train the model, print accuracy, plot confusion_matrix, and plot precision-recall curve.
Set hyperparameters (use GridSearchCV)
Visualization

Check the full description (in Persian)

Contact

If you have any questions, feel free to ask me:
📩 arminzolfagharid@gmail.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Diabetes Classification with XGBoost

Diabetes Dataset

XGBoost

Contact

Files

README.md

Latest commit

History

README.md

File metadata and controls

Diabetes Classification with XGBoost

Diabetes Dataset

XGBoost

Contact