Low accuracy eigenvalue prediction with `e3baseline_0` descriptor #179

alikhamze · 2024-05-23T21:15:29Z

alikhamze
May 23, 2024

Hello,

I have been testing DeePTB v2 with the e3baseline_0 descriptor (I have found this descriptor works better than the se2 descriptor) on a large dataset of defective BN supercells. When I use the model for inference on the test set (which is composed of the same structures but k-points not included in the training data), the mean absolute error (MAE) is too high for practical prediction of properties. Here is a 2d histogram of the error in the eigenvalues vs the actual eigenvalues:

I am using ["2s", "2p", "d*"] for the basis for both B and N. My model options are:

"model_options": {
        "embedding":{
            "method": "e3baseline_0",
            "lmax" : 4,
            "r_max": 7.0,
            "latent_kwargs" : {
                    "mlp_latent_dimensions": [128, 128, 256],
                    "mlp_nonlinearity": "silu",
                    "mlp_initialization": "uniform"
            }
        },
        "prediction":{
            "method": "e3tb"
        }
    }

The atomic data options in my training data files are all:

"AtomicData_options": {
        "er_max": 5.0,
        "oer_max": 2.5,
        "pbc": true,
        "r_max": 5.0
}

The final training loss was 0.046430 and the final validation loss was 0.101003.

Do y'all have any suggestions as to how to improve the accuracy for systems with many defects?

Thanks!

floatingCatty · 2024-05-24T02:25:49Z

floatingCatty
May 24, 2024
Maintainer

Hello!

Thanks for using our package. First I want to ensure that you are using eigenvalues from DFT as the training target right?

A quick answer for your cases is: This "e3baseline" parameterization is mainly developed to directly fitting the DFT Hamiltonian matrix under LCAO basis instead of the eigenvalues, please use the "se2" for fitting eigenvalues.

The poor performance in your with se2 is mainly caused by the training procedure. To train an accurate and transferable DeePTB model using eigenvalues, there is a general procedure, please see our hBN/silicon examples for details.

Here is more explanation on "e3baseline":

The e3baseline_0 descriptors, along with others prefixed with "e3baseline", are equivariant graph-neural-network-based descriptors that parameterize the Hamiltonian in tensor product space of the E3 group. Unlike the two-center approximation applied in SKTB format, it does not have any constraint on the Hamiltonian elements, except they must obey the equivariance under E3 group operations. This feature has been tested, and will be released very soon.

Please feel free to reach us if there are more questions.

Cheers!

Zhanghao

0 replies

alikhamze · 2024-05-28T02:07:47Z

alikhamze
May 28, 2024
Author

Hi there, Thank you for the speedy reply! It's good to know the e3baseline descriptors are designed for Hamiltonian training--I have a dataset of Hamiltonian data so I will try that as well. I will try using the se2 descriptor this week and follow the procedure shown in the documentation. Just to clarify, I should train in this order (following the Si example), right? - NNSK model - Perfect crystal model only - First with only the first nearest neighbor, no onsite correction, and valence only basis - Then with the onsite correction and orbitals for the conduction bands - Then push the cutoff radius out - Then push the w to a smaller value - Next, train bond-length dependent parameters using MD data - Start with a low-T dataset, and then increase to higher T. - Environmentally-dependent model using the se2 descriptor I have a couple of questions about the process: - When pushing r_cut out or reducing w, does the training output the best value? For example, I don't have a good intuition for what omega should be, so during the push procedure, how will I know if I have reduced it too much? Or, for r_cut, if I pushed it too far? - The bond-length dependent training still needs the perfect crystal in the training set so it does not forget those parameters, correct? - Since I am interested in structures with defects, where is the best place in this procedure to introduce them? Thank you!

…

On Thursday, May 23rd, 2024 at 10:26 PM, Yinzhanghao Zhou ***@***.***> wrote: Hello! Thanks for using our package. First I want to ensure that you are using eigenvalues from DFT as the training target right? A quick answer for your cases is: This "e3baseline" parameterization is mainly developed to directly fitting the DFT Hamiltonian matrix under LCAO basis instead of the eigenvalues, please use the "se2" for fitting eigenvalues. The poor performance in your with se2 is mainly caused by the training procedure. To train an accurate and transferable DeePTB model using eigenvalues, there is a general procedure, please see our hBN/silicon examples for details. Here is more explanation on "e3baseline": The e3baseline_0 descriptors, along with others prefixed with "e3baseline", are equivariant graph-neural-network-based descriptors that parameterize the Hamiltonian in tensor product space of the E3 group. Unlike the two-center approximation applied in SKTB format, it does not have any constraint on the Hamiltonian elements, except they must obey the equivariance under E3 group operations. This feature has been tested, and will be released very soon. Please feel free to reach us if there are more questions. Cheers! Zhanghao — Reply to this email directly, [view it on GitHub](#179 (comment)), or [unsubscribe](https://github.com/notifications/unsubscribe-auth/ABO2ZUPI64T3S76T6RFAHX3ZD2QMFAVCNFSM6AAAAABIGM77SSVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4TKNBRHA2TM). You are receiving this because you authored the thread.Message ID: ***@***.***>

1 reply

floatingCatty May 28, 2024
Maintainer

Hello!

Yes, you should follow the procedure in the silicon example. The procedure you describe is correct, please proceed !

For the questions:

"When pushing r_cut out or reducing w, does the training output the best value? For example, I don't have a good intuition for what omega should be, so during the push procedure, how will I know if I have reduced it too much? Or, for r_cut, if I pushed it too far?"

re: For omega, here's some intuitive setting: we use 0.7 for structures that are under large positional perturbation (so omega needs to be large to smooth the hopping curve.), and 0.3 for conventional MD structures (not melt), and 0.1, or even 0.01 for structures with slight perturbation. You can also choose the value according to the training, since when pushing "rs" and "w", the software would output the checkpoint with the best testing loss (or training loss if the test set is not settled) each time when "rs" and "w" are altered. You can pick one best checkpoint and see the corresponding "rs" and "w" values.

Since I am interested in structures with defects, where is the best place in this procedure to introduce them?

re: I suggest introducing them from the start. That means, you should and at least one defect structure and one perfect lattice in the training set. And proceeding with the following steps with the combined datasets.

Hope you get a good fit! If you have any other questions, feel free to reach us.

Best,

Zhanghao

alikhamze · 2024-05-31T21:43:56Z

alikhamze
May 31, 2024
Author

Thanks for your guidance, Zhanghao!

I am currently training a model as you suggested and I'm excited to see the outcome.

I had couple more questions as I follow the Si example:

I noticed for the bond-length dependence steps, you have different $\omega$ for the onsite and hopping terms. When pushing $\omega$, which one is adjusted? For highly defective data like I have, is it best to keep $\omega = 0.7$ for both?
I also noticed you introduce reference data (in place of validation data) for the bond-length dependence step and also include it in the environmental training step. Can you explain to me what the reference data is in these contexts? How does it affect the training? I don't see losses for it output in the log.

Thanks again!
-Ali

1 reply

floatingCatty Jun 1, 2024
Maintainer

Hello!

For the mentioned question, here are the answers:

When pushing $\omega$, it is the hopping's value which will be altered by a certain period of training updates. The onsite's $\omega$ term is only useful when adopting "strain" onsite mode, which includes the strain effect correction, and would not be affected by the push function.
The reference data is included to increase the training stability. For example, if you are training the model on the molecular dynamics trajectories, the first few training iterations would be less stable, since the bond length and environmental dependence are not physical in the beginning. Therefore, it can potentially break the band structure, leading to incorrect correspondence of predicted bands to the target band. The reference dataset helps here since in each iteration, the gradient computed from the training dataset would be mixed with those from the reference dataset. Therefore, we can restrict the reference dataset be a perfect lattice, which acts as a regularization or penalty for unphysical bond-length dependency. You can set the reference dataset to perfect lattice, or some basic defect structure that can act as such regularization.

Hope this information helps.

Warm Regards,

Zhanghao

alikhamze · 2024-06-03T21:03:08Z

alikhamze
Jun 3, 2024
Author

Hello,

Thank you for the explanation about $\omega$ and the reference data!
I have start training the environmental correction on the dataset, but it does not appear that the loss is decreasing--do you have any idea why this might be? Here is the loss vs epoch:

Here is a summary of the training procedure:
For all models, $\omega$ has been set to 0.7, matching the guidance you gave above.
This model was trained on an initial set of perfect crystal, crystal + defect, and strained crystal and strained crystal + defects. (For the bond-length dependence and environmental dependence training described below, this dataset is used as a reference.)
The first step was 350 epochs with 1NN only, after which I did strain onsite and 1NN for another 500 epochs. Then, I used the push procedure for the cutoff and increased it to include up to the 4th or 5th NN. The final loss at this stage was ~0.2,

Then I added in additional data to learn the bond-length dependence. The first step was single snapshots for each of the structures in the initial dataset taken at 300 K and 1000 K MD data for 500 epochs, in addition to the initial dataset. The next step was another 500 epochs with snapshots from 1700 and 2400 K MD runs in addition to the lower temperature MD structures and initial data.
At the end of this stage, the loss was ~0.6, which I thought was okay since the NNSK model should struggle with the environments added by MD.

Now, I am training the environmental dependence. The parameters are the same as in the Si example, except for 1) the cutoffs are for my system, 2) the LR is set to 0.01 (10x the Si example), which I can do because 3) I have a batch size of 4, and finally 4) I have a validation set (eigenvalues at k-points not in the training set) for the training data that I am using in addition to the reference data.

Do you have any thoughts on why the loss seems flat?

Thanks!

3 replies

floatingCatty Jun 6, 2024
Maintainer

Hello!

A quick answer is, that the previous training steps were not conducted correctly. The initial loss of ~0.2 is too large for a good starting point for a well-trained 4/5th NN model.

You can validate this by plotting the band structure of the fitted model w.r.t. The model's DFT calculation outputted in each step.

A criterion for a successful training step is: The model's band structure has the correct shape (which means the correspondence of each band aligns with the target band.). If the correspondence is broken, adding environmental dependence would not help to improve the accuracy.

Ensure each training step improves the accuracy of last step and most importantly, it doesn't break the band structure correspondence, while proceeding to the next step.

To ensure the correspondence is correct, train each step while plotting the band structure of the output checkpoint. The training would be quite stable in the first few steps. After learning the band correspondence, you can add more neighbours/orbitals or onsite correction to increase the accuracy.

Please tell us freely if anything goes wrong or any help is needed!

Best,

Zhanghao

alikhamze Jun 6, 2024
Author

Hi @floatingCatty ,

I'll rerun the steps where I added the MD data to train the bond length dependence, and then retry this, but regardless, as the environmental model is more descriptive, no matter what the starting loss is, it should go down, no? I am also seeing this in another model where the initial loss was even lower (0.1).

The fact that it is not seems to me to indicate there's some other issue present. I am using the neural network configuration from the Si example, could the network be too small for fitting a system with defects? Or could the other settings in the embedding be incorrect?

"model_options": {
        "embedding":{
            "method": "se2",
            "rs": 2.0,
            "rc": 4.75,
            "radial_net": {
                "neurons": [10,20,30]
            }
        },
        "prediction":{
            "method": "sktb",
            "neurons": [16,16,16]
        },
        "nnsk": {
            "onsite": {"method": "strain", "rs":2.0 ,"w":0.7},
            "hopping": {"method": "powerlaw", "rs":4.75, "w": 0.7},
            "freeze": true
        }
    }

Thanks for your help,
-Ali

floatingCatty Jun 11, 2024
Maintainer

Hello! @alikhamze

"Regardless, as the environmental model is more descriptive, no matter what the starting loss is, it should go down, no?" In general, the optimization utilises a gradient-based algorithm which should decrease the loss on average. However, a bad starting point could lead to unphysical parameters, which might cause a very shallow local minimum, and the gradient-based optimization is not capable of jumping out. Therefore, in our practice, following the training steps and obtaining a good starting non-environmental model is very important.

The network setting seems good to me. Our model is not very sensitive to the network parameters as long as it contains sufficient number of parameters and cutoff range. Please don't worry about using the setting in the examples.

Please feel free to contact us if anything is needed.

Best,

Zhanghao

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Low accuracy eigenvalue prediction with `e3baseline_0` descriptor #179

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Low accuracy eigenvalue prediction with e3baseline_0 descriptor #179

alikhamze May 23, 2024

Replies: 4 comments · 5 replies

floatingCatty May 24, 2024 Maintainer

alikhamze May 28, 2024 Author

floatingCatty May 28, 2024 Maintainer

alikhamze May 31, 2024 Author

floatingCatty Jun 1, 2024 Maintainer

alikhamze Jun 3, 2024 Author

floatingCatty Jun 6, 2024 Maintainer

alikhamze Jun 6, 2024 Author

floatingCatty Jun 11, 2024 Maintainer

Low accuracy eigenvalue prediction with `e3baseline_0` descriptor #179

alikhamze
May 23, 2024

Replies: 4 comments 5 replies

floatingCatty
May 24, 2024
Maintainer

alikhamze
May 28, 2024
Author

floatingCatty May 28, 2024
Maintainer

alikhamze
May 31, 2024
Author

floatingCatty Jun 1, 2024
Maintainer

alikhamze
Jun 3, 2024
Author

floatingCatty Jun 6, 2024
Maintainer

alikhamze Jun 6, 2024
Author

floatingCatty Jun 11, 2024
Maintainer