Skip to content

Algorithms for my course of machine learning applied to physics

Notifications You must be signed in to change notification settings

yakovchenk0/machinelearning_grad

Repository files navigation

Whitepaper: Exploratory Analysis and Modeling of Formation Energy of Materials Using Python

  1. Introduction:

In this document, we detail an approach to explore and model the formation energy of materials based on their respective data. We employ the Python programming language alongside several popular libraries to conduct this analysis.

  1. Data Preparation:

Initially, we import relevant packages such as pandas, numpy, and sklearn. To ensure the replicability of our results, we set a random seed.

We load the dataset AB2_formation_energy_materials_spacegroup.csv and undertake a few cleaning steps, such as removing unnamed columns and handling missing values.

  1. Feature Processing:

Given that we have categorical features like 'Material' and 'Space Group', we need to convert them into numerical representations. To do this, we opt to use TF-IDF vectorization, a widely used technique to transform text into numerical values. For this purpose, we define a vectorization function which tokenizes and transforms the lists of materials and space groups.

  1. Modeling:

For modeling, we choose two linear regression models: Lasso and Ridge. These are regularization methods that help prevent overfitting. Using a standard training/testing split of 80/20, we train and test each model.

  1. Performance Evaluation:

The performance of each model is assessed using two popular metrics: Mean Squared Error (MSE) and the coefficient of determination R². These metrics give us an insight into how well our model fits the data and how well it can generalize to unseen data.

Furthermore, we visualize the results of our models using plots, where actual and predicted values are plotted side-by-side. This provides a clear visual perception of our model's predictions in comparison to the actual values.

  1. Conclusion:

Through this analysis, we manage to transform categorical features into numerical ones and train linear regression models to predict the formation energy of materials. The evaluation metrics help us understand the effectiveness of our models. This kind of modeling and analysis is pivotal for understanding material properties and can be used in many practical applications in material science.

Note: This is a summarized and simplified whitepaper. For a more in-depth study, a deeper dive into features, the usage of other models and preprocessing techniques, and a more comprehensive statistical analysis of results is recommended.

About

Algorithms for my course of machine learning applied to physics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published