- Handling missing values: Imputation techniques were used to replace missing values in key features.
- Encoding categorical variables: One-Hot Encoding was applied to categorical variables.
- Normalization and scaling: PCA (Principal Component Analysis) was used for dimensionality reduction and feature scaling.
2. Model Selection Using Multiple Classifiers (e.g., Logistic Regression, RandomForest, GradientBoosting, XGBoost)
- Logistic Regression: Used for classifying machine status.
- Random Forest Classifier: Utilizes multiple trees to handle complex feature relationships.
- Gradient Boosting Classifier: Boosting model to improve accuracy.
- XGBoost Classifier: An advanced boosting model optimized for speed and efficiency.
PCA was used to reduce feature dimensionality and improve model performance.
- Precision: The proportion of correctly predicted positive cases.
- Recall: The proportion of true positives over the total actual positive cases.
- F1 Score: The harmonic mean of precision and recall.
- Accuracy: The proportion of correct predictions overall.
- Confusion Matrix: To visualize true positives, false positives, etc.