diff --git a/README.md b/README.md
index 4d74c7b4..d5ff953a 100644
--- a/README.md
+++ b/README.md
@@ -6,15 +6,23 @@
 
 ## About DeePTB
 
-**DeePTB** is an innovative Python package that employs deep learning to construct electronic tight-binding (TB) Hamiltonians with a minimal basis. It is designed to:
+**DeePTB** is an innovative Python package that employs deep learning to construct electronic Hamiltonians using a minimal basis Slater-Koster TB(**SKTB**), and full LCAO basis using E3-equivariant neural networks (**E3TB**). It is designed to:
 
-- Efficiently predict TB Hamiltonians for large, unseen structures based on training with smaller ones.
+- Efficiently predict TB/LCAO Hamiltonians for large, unseen structures based on training with smaller ones.
 - Enable simulations of large systems under structural perturbations, finite temperature simulations integrating molecular dynamics (MD) for comprehensive atomic and electronic behavior.
+
+For **SKTB**:
 - Support customizable Slater-Koster parameterization with neural network incorporation for local environmental corrections. 
 - Operate independently of the choice of bases and exchange-correlation functionals, offering flexibility and adaptability.
 - Handle systems with strong spin-orbit coupling (SOC) effects.
 
-**DeePTB** is a versatile tool adaptable for a wide range of materials and phenomena, providing accurate and efficient simulations. See more details in our DeePTB paper: [arXiv:2307.04638](http://arxiv.org/abs/2307.04638)
+For **E3TB**:
+- Support constructing DFT Hamiltonians/density and overlap matrices under full LCAO basis.
+- Utilize strictly local and semi-local E3-equivariant neural networks to achieve high data-efficiency and accuracy.
+- Speed up via SO(2)convolution to support LCAO basis containing f and g orbitals.
+
+**DeePTB** is a versatile tool adaptable for a wide range of materials and phenomena, providing accurate and efficient simulations. See more details in our DeePTB paper: [sktb: arXiv:2307.04638](http://arxiv.org/abs/2307.04638), [e3tb: arXiv:2407.06053](https://arxiv.org/pdf/2407.06053)
+
 
 ## Installation
 
diff --git a/docs/advanced/e3tb/advanced_input.md b/docs/advanced/e3tb/advanced_input.md
new file mode 100644
index 00000000..9a39efc8
--- /dev/null
+++ b/docs/advanced/e3tb/advanced_input.md
@@ -0,0 +1,97 @@
+# More on Input Parameters
+In `common_options`, the user should define the some global param like:
+```JSON
+"common_options": {
+            "basis": {
+                "C": "2s2p1d",
+                "N": "2s2p1d"
+            },
+            "device": "cuda",
+            "dtype": "float32",
+            "overlap": true,
+            "seed": 42
+    }
+```
+- `basis` should align with the basis used to perform LCAO DFT calculations. The `"2s2p1d"` here indicates 2x`s` orbital, 2x`p`orbital and one `d` orbital. The 
+- `seed` controls the global random seed of all related packages. `dtype` can be chosen between `float32` and `float64`, but the former is accurate enough in most cases. If you have multiple cards, the 
+- `device` can be setted as `cuda:0`, `cuda:1` and so on, where the number is the device id. 
+- `overlap` controls the fitting of the overlap matrix. The user should provide overlap in the dataset when configuring the data_options if `overlap` is setted as True.
+
+In `train_options`, a common parameter looks like this:
+```JSON
+"train_options": {
+    "num_epoch": 500,
+    "batch_size": 1,
+    "optimizer": {
+        "lr": 0.05,
+        "type": "Adam"
+    },
+    "lr_scheduler": {
+        "type": "rop",
+        "factor": 0.8,
+        "patience": 6,
+        "min_lr": 1e-6
+    },
+    "loss_options":{
+        "train": {"method": "hamil_abs", "onsite_shift": false},
+        "validation": {"method": "hamil_abs", "onsite_shift": false}
+    },
+    "save_freq": 10,
+    "validation_freq": 10,
+    "display_freq": 10
+}
+```
+For `lr_scheduler`, please ensure the `patience` x `num_samples` / `batch_size` ranged between 2000 to 6000.
+
+When the dataset contains multiple elements, and you are fitting the Hamiltonian, it is suggested to open a tag in loss_options for better performance. Most DFT software would allow for a uniform shift when computing the electrostatic potentials, therefore, bringing an extra degree of freedom. The `onsite_shift` tag allows such freedom and makes the model generalizable to all sorts of element combinations:
+```JSON
+"loss_options":{
+        "train": {"method": "hamil_abs", "onsite_shift": true},
+        "validation": {"method": "hamil_abs", "onsite_shift" : true}
+    }
+```
+
+In `model_options`, we support two types of e3 group equivariant embedding methods: Strictly Localized Equivariant Message-passing or `slem`, and Localized Equivariant Message-passing or `lem`. The former ensures strict localization by truncating the propagation of distant neighbours' information and, therefore is suitable for bulk systems where the electron localization is enhanced by the scattering effect. `Lem` method, on the other hand, contained such localization design inherently by incorporating learnable decaying functions describing the dependency across distance.
+
+The model options for slem and lem are the same, here is an short example:
+```JSON
+"model_options": {
+    "embedding": {
+        "method": "slem", # or lem
+        "r_max": {"Mo":7.0, "S":7.0, "W": 8.0},
+        "irreps_hidden": "64x0e+32x1o+32x2e+32x3o+32x4e+16x5o+8x6e+4x7o+4x8e",
+        "n_layers": 4,
+        "env_embed_multiplicity": 10,
+        "avg_num_neighbors": 51,
+        "latent_dim": 64,
+    },
+    "prediction":{
+        "method": "e3tb", # need to be set as e3tb here
+        "neurons": [32, 32]
+    }
+}
+```
+Here, `method` indicates the e3 descripor employed. 
+
+`r_max` can be a float or int number, or a dict with atom species-specific float/int number, which indicates their cutoff envelope function, used to decay the distant atom's effect smoothly. We highly suggest the user go to the DFT calculation files and check the orbital's radial cutoff information to figure out how large this value should be.
+
+`irreps_hidden`: Very important! This parameter decides mostly the representation capacity of the model, along with the model size and consumption of GPU memory. This parameter indicates the irreps of hidden equivariant space, the definition here follows that for example, `64x0e` states `64` irreducible representation with `l=0` and `even` parity. For each basis set, we provide a tool to generate the least essential `irreps_hidden`, we also highly suggest the user add at least 3 times the number of essential irreps to enhance representation capacity.
+
+```IPYTHON
+In [5]: from dptb.data import OrbitalMapper
+
+In [6]: idp = OrbitalMapper(basis={"Si": "2s2p1d"})
+
+In [7]: idp.get_irreps_ess()
+Out[7]: 7x0e+6x1o+6x2e+2x3o+1x4e
+```
+
+`n_layers`: indicates the number of layers of the networks.
+
+`env_embed_multiplicity`: decide the irreps number when initializing the edge and node features.
+
+`avg_num_neighbors`: the averaged number of neighbours in the system given the cutoff radius set as `r_max`. It is recommended to do statistics of the system you are modelling, but just picking up a number ranging from 50 to 100 is also okay.
+
+`latent_dim`: The scalar channel's dimension of the system. 32/64/128 is good enough.
+
+For params in prediction, there is not much to be changed. The setting is pretty good.
\ No newline at end of file
diff --git a/docs/advanced/e3tb/data_preparation.md b/docs/advanced/e3tb/data_preparation.md
new file mode 100644
index 00000000..6f3fe698
--- /dev/null
+++ b/docs/advanced/e3tb/data_preparation.md
@@ -0,0 +1,76 @@
+# Data Preparation
+We suggest the user use a data parsing tool [dftio](https://github.com/floatingCatty/dftio) to directly convert the output data from DFT calculation into readable datasets. Our implementation supports the parsed dataset format of `dftio`. Users can just clone the `dftio` repository and run `pip install .` in its root directory. Then one can use the following parsing command for the parallel data processing directly from the DFT output:
+```bash
+usage: dftio parse [-h] [-ll {DEBUG,3,INFO,2,WARNING,1,ERROR,0}] [-lp LOG_PATH] [-m MODE] [-n NUM_WORKERS] [-r ROOT] [-p PREFIX] [-o OUTROOT] [-f FORMAT] [-ham] [-ovp] [-dm] [-eig]
+
+optional arguments:
+  -h, --help            show this help message and exit
+  -ll {DEBUG,3,INFO,2,WARNING,1,ERROR,0}, --log-level {DEBUG,3,INFO,2,WARNING,1,ERROR,0}
+                        set verbosity level by string or number, 0=ERROR, 1=WARNING, 2=INFO and 3=DEBUG (default: INFO)
+  -lp LOG_PATH, --log-path LOG_PATH
+                        set log file to log messages to disk, if not specified, the logs will only be output to console (default: None)
+  -m MODE, --mode MODE  The name of the DFT software. (default: abacus)
+  -n NUM_WORKERS, --num_workers NUM_WORKERS
+                        The number of workers used to parse the dataset. (For n>1, we use the multiprocessing to accelerate io.) (default: 1)
+  -r ROOT, --root ROOT  The root directory of the DFT files. (default: ./)
+  -p PREFIX, --prefix PREFIX
+                        The prefix of the DFT files under root. (default: frame)
+  -o OUTROOT, --outroot OUTROOT
+                        The output root directory. (default: ./)
+  -f FORMAT, --format FORMAT
+                        The output root directory. (default: dat)
+  -ham, --hamiltonian   Whether to parse the Hamiltonian matrix. (default: False)
+  -ovp, --overlap       Whether to parse the Overlap matrix (default: False)
+  -dm, --density_matrix
+                        Whether to parse the Density matrix (default: False)
+  -eig, --eigenvalue    Whether to parse the kpoints and eigenvalues (default: False)
+```
+
+After parsing, the user need to write a info.json file and put it in the dataset. For default dataset type, the `info.json` looks like:
+
+```JSON
+{
+    "nframes": 1,
+    "pos_type": "cart",
+    "AtomicData_options": {
+        "r_max": 7.0,
+        "pbc": true
+    }
+}
+
+```
+Here `pos_type` can be `cart`, `dirc` or `ase`. For `dftio` output dataset, we use `cart` by default. The `r_max`, in principle, should align with the orbital cutoff in the DFT calculation. For a single element, the `r_max` should be a float number, indicating the largest bond distance included. When the system has multiple atoms, the `r_max` can also be a dict of atomic species-specific number like `{A: 7.0, B: 8.0}`. Then the largest bond `A-A` would be 7 and `A-B` be (7+8)/2=7.5, and `B-B` would be 8. `pbc` can be a bool variable, indicating the open or close of the periodic boundary conditions of the model. It can also be a list of three bool elements like `[true, true, false]`, which means we can set the periodicity of each direction independently.
+
+For LMDB type Dataset, the info.json is much simpler, which looks like this:
+```JSON
+{
+    "r_max": 7.0
+}
+```
+Where other information has been stored in the dataset. LMDB dataset is designed for handeling very large data that cannot be fit into the memory directly.
+
+Then you can set the `data_options` in the input parameters to point directly to the prepared dataset, like:
+```JSON
+"data_options": {
+        "train": {
+            "root": "./data",
+            "prefix": "Si64",
+            "get_Hamiltonian": true,
+            "get_overlap": true
+        }
+    }
+```
+
+If you are using a python script, the dataset can be build with the same parameters using `build_datasets`:
+```Python
+from dptb.data import build_dataset
+
+dataset = build_dataset(
+    root="your dataset root",
+    type="DefaultDataset",
+    prefix="frame",
+    get_overlap=True,
+    get_Hamiltonian=True,
+    basis={"Si":"2s2p1d"}
+    )
+```
diff --git a/docs/advanced/e3tb/index.rst b/docs/advanced/e3tb/index.rst
new file mode 100644
index 00000000..39f149f7
--- /dev/null
+++ b/docs/advanced/e3tb/index.rst
@@ -0,0 +1,11 @@
+=================================================
+E3TB Advanced
+=================================================
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Examples
+
+   advanced_input
+   data_preparation
+   loss_analysis
\ No newline at end of file
diff --git a/docs/advanced/e3tb/loss_analysis.md b/docs/advanced/e3tb/loss_analysis.md
new file mode 100644
index 00000000..d136b5c6
--- /dev/null
+++ b/docs/advanced/e3tb/loss_analysis.md
@@ -0,0 +1,123 @@
+# Loss Analysis
+## function
+The **DeePTB** contains a module to help the user better understand the details of the error of the **E3TB** module.
+We decompose the error of **E3TB** model into several parts:
+- onsite blocks: for diagonal blocks of the predicted quantum tensors the onsite blocks are further arranged according to the atom species.
+- hopping blocks: for off-diagonal blocks, the hopping block errors are then further arranged according to the atom-pair types.
+
+## usage
+For using this function, we need a dataset and the model. Just build them up in advance.
+```Python
+from dptb.data import build_dataset
+from dptb.nn import build_model
+
+dataset = build_dataset(
+    root="your dataset root",
+    type="DefaultDataset",
+    prefix="frame",
+    get_overlap=True,
+    get_Hamiltonian=True,
+    basis={"Si":"2s2p1d"}
+    )
+
+model = build_model("./ovp/checkpoint/nnenv.best.pth", common_options={"device":"cuda"})
+model.eval()
+```
+
+Then, the user should sample over the dataset using the dataloader and doing a analysis with running average, the code looks like:
+```Python
+import torch
+from dptb.nnops.loss import HamilLossAnalysis
+from dptb.data.dataloader import DataLoader
+from tqdm import tqdm
+from dptb.data import AtomicData
+
+ana = HamilLossAnalysis(idp=model.idp, device=model.device, decompose=True, overlap=True)
+
+loader = DataLoader(dataset, batch_size=10, shuffle=False, num_workers=0)
+
+for data in tqdm(loader, desc="doing error analysis"):
+    with torch.no_grad():
+        ref_data = AtomicData.to_AtomicDataDict(data.to("cuda"))
+        data = model(ref_data)
+        ana(data, ref_data, running_avg=True)
+```
+The analysis results are stored in `ana.stats`, which is a dictionary of statistics. The user can check the value directly, or display the results by:
+
+```Python
+ana.report()
+```
+Here is an example of the output:
+```
+TOTAL:
+MAE: 0.00012021172733511776
+RMSE: 0.00034208124270662665
+
+
+Onsite: 
+Si:
+MAE: 0.0012505357153713703
+RMSE: 0.0023699181620031595
+```
+![MAE onsite](../../img/MAE_onsite.png)
+![RMSE onsite](../../img/RMSE_onsite.png)
+
+```
+Hopping: 
+Si-Si:
+MAE: 0.00016888207755982876
+RMSE: 0.0003886453341692686
+```
+![MAE hopping](../../img/MAE_hopping.png)
+![RMSE hopping](../../img/RMSE_hopping.png)
+
+If the user wants to see the loss in a decomposed irreps format, one can set the `decompose` of the `HamilLossAnalysis` class to `True`, and rerun the analysis.  We can display the decomposed irreps results using the following code:
+```Python
+import matplotlib.pyplot as plt
+import torch
+
+ana_result = ana.stats
+
+for bt, err in ana_result["hopping"].items():
+    print("rmse err for bond {bt}: {rmserr} \t mae err for bond {bt}: {maerr}".format(bt=bt, rmserr=err["rmse"], maerr=err["mae"]))
+
+for bt, err in ana_result["onsite"].items():
+    print("rmse err for atom {bt}: {rmserr} \t mae err for atom {bt}: {maerr}".format(bt=bt, rmserr=err["rmse"], maerr=err["mae"]))
+
+for bt, err in ana_result["hopping"].items():
+    x = list(range(model.idp.orbpair_irreps.num_irreps))
+    rmserr = err["rmse_per_irreps"]
+    maerr = err["mae_per_irreps"]
+    sort_index = torch.LongTensor(model.idp.orbpair_irreps.sort().inv)
+    
+    # rmserr = rmserr[sort_index]
+    # maerr = maerr[sort_index]
+    
+    plt.figure(figsize=(20,3))
+    plt.bar(x, rmserr.cpu().detach(), label="RMSE per rme")
+    plt.bar(x, maerr.cpu().detach(), alpha=0.6, label="MAE per rme")
+    plt.legend()
+    # plt.yscale("log")
+    # plt.ylim([1e-5, 5e-4])
+    plt.title("rme specific error of bond type: {bt}".format(bt=bt))
+    plt.show()
+
+for at, err in ana_result["onsite"].items():
+    x = list(range(model.idp.orbpair_irreps.num_irreps))
+    rmserr = err["rmse_per_irreps"]
+    maerr = err["mae_per_irreps"]
+    sort_index = torch.LongTensor(model.idp.orbpair_irreps.sort().inv)
+
+    rmserr = rmserr[sort_index]
+    maerr = maerr[sort_index]
+
+    plt.figure(figsize=(20,3))
+    plt.bar(x, rmserr.cpu().detach(), label="RMSE per rme")
+    plt.bar(x, maerr.cpu().detach(), alpha=0.6, label="MAE per rme")
+    plt.legend()
+    # plt.yscale("log")
+    # plt.ylim([1e-5, 2.e-2])
+    plt.title("rme specific error of atom type: {at}".format(at=at))
+    plt.show()
+
+```
\ No newline at end of file
diff --git a/docs/advanced/dftb.md b/docs/advanced/sktb/dftb.md
similarity index 100%
rename from docs/advanced/dftb.md
rename to docs/advanced/sktb/dftb.md
diff --git a/docs/advanced/dptb_env.md b/docs/advanced/sktb/dptb_env.md
similarity index 100%
rename from docs/advanced/dptb_env.md
rename to docs/advanced/sktb/dptb_env.md
diff --git a/docs/advanced/sktb/index.rst b/docs/advanced/sktb/index.rst
new file mode 100644
index 00000000..ce995246
--- /dev/null
+++ b/docs/advanced/sktb/index.rst
@@ -0,0 +1,12 @@
+=================================================
+SKTB Advanced
+=================================================
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Examples
+
+   dftb
+   dptb_env
+   nrl_tb
+   soc
diff --git a/docs/advanced/nrl_tb.md b/docs/advanced/sktb/nrl_tb.md
similarity index 100%
rename from docs/advanced/nrl_tb.md
rename to docs/advanced/sktb/nrl_tb.md
diff --git a/docs/advanced/soc.md b/docs/advanced/sktb/soc.md
similarity index 100%
rename from docs/advanced/soc.md
rename to docs/advanced/sktb/soc.md
diff --git a/docs/img/MAE_hopping.png b/docs/img/MAE_hopping.png
new file mode 100644
index 00000000..33085ac4
Binary files /dev/null and b/docs/img/MAE_hopping.png differ
diff --git a/docs/img/MAE_onsite.png b/docs/img/MAE_onsite.png
new file mode 100644
index 00000000..3b2e2f2b
Binary files /dev/null and b/docs/img/MAE_onsite.png differ
diff --git a/docs/img/RMSE_hopping.png b/docs/img/RMSE_hopping.png
new file mode 100644
index 00000000..737dc798
Binary files /dev/null and b/docs/img/RMSE_hopping.png differ
diff --git a/docs/img/RMSE_onsite.png b/docs/img/RMSE_onsite.png
new file mode 100644
index 00000000..15dd3e09
Binary files /dev/null and b/docs/img/RMSE_onsite.png differ
diff --git a/docs/img/silicon_e3_band.png b/docs/img/silicon_e3_band.png
new file mode 100644
index 00000000..075d2578
Binary files /dev/null and b/docs/img/silicon_e3_band.png differ
diff --git a/docs/index.rst b/docs/index.rst
index ea62ef28..e5cd1eb6 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -6,18 +6,35 @@
 DeePTB Documentation
 =================================================
 
-**DeePTB** is a Python package that adopts the deep learning method to construct electronic tight-binding (TB) Hamiltonians using a minimal basis.
-With a neural network environmental correction scheme, **DeePTB** can efficiently predict TB Hamiltonians for large-size unseen structures with *ab initio* accuracy after training with *ab initio* eigenvalues from smaller sizes. 
-This feature enables efficient simulations of large-size systems under structural perturbations such as strain, which is crucial for semiconductor band gap engineering. Furthermore, DeePTB offers the ability to perform efficient and accurate finite temperature simulations, incorporating both atomic and electronic behaviour through the integration of molecular dynamics (MD). Another significant advantage is that using eigenvalues as the training labels makes DeePTB much more flexible and independent of the choice of various bases (PW or LCAO) and the exchange-correlation (XC) functionals (LDA, GGA and even HSE) used in preparing the training labels. In addition, **DeePTB** can handle systems with strong spin-orbit coupling (SOC) effects.
-These capabilities make **DeePTB** adaptable to various research scenarios, extending its applicability to a wide range of materials and phenomena and offering a powerful and versatile tool for accurate and efficient simulations.
+.. **DeePTB** is a Python package that adopts the deep learning method to construct electronic tight-binding (TB) Hamiltonians using a minimal basis.
+.. With a neural network environmental correction scheme, **DeePTB** can efficiently predict TB Hamiltonians for large-size unseen structures with *ab initio* accuracy after training with *ab initio* eigenvalues from smaller sizes. 
+.. This feature enables efficient simulations of large-size systems under structural perturbations such as strain, which is crucial for semiconductor band gap engineering. Furthermore, DeePTB offers the ability to perform efficient and accurate finite temperature simulations, incorporating both atomic and electronic behaviour through the integration of molecular dynamics (MD). Another significant advantage is that using eigenvalues as the training labels makes DeePTB much more flexible and independent of the choice of various bases (PW or LCAO) and the exchange-correlation (XC) functionals (LDA, GGA and even HSE) used in preparing the training labels. In addition, **DeePTB** can handle systems with strong spin-orbit coupling (SOC) effects.
+.. These capabilities make **DeePTB** adaptable to various research scenarios, extending its applicability to a wide range of materials and phenomena and offering a powerful and versatile tool for accurate and efficient simulations.
+**DeePTB** is an innovative Python package that employs deep learning to construct electronic Hamiltonians using a minimal basis Slater-Koster TB(**SKTB**), and full LCAO basis using E3-equivariant neural networks (**E3TB**). It is designed to:
+
+- Efficiently predict TB/LCAO Hamiltonians for large, unseen structures based on training with smaller ones.
+- Enable simulations of large systems under structural perturbations, finite temperature simulations integrating molecular dynamics (MD) for comprehensive atomic and electronic behavior.
+
+For **SKTB**:
+- Support customizable Slater-Koster parameterization with neural network incorporation for local environmental corrections. 
+- Operate independently of the choice of bases and exchange-correlation functionals, offering flexibility and adaptability.
+- Handle systems with strong spin-orbit coupling (SOC) effects.
+
+For **E3TB**:
+- Support constructing DFT Hamiltonians/density and overlap matrices under full LCAO basis.
+- Utilize strictly local and semi-local E3-equivariant neural networks to achieve high data-efficiency and accuracy.
+- Speed up via SO(2)convolution to support LCAO basis containing f and g orbitals.
+
+**DeePTB** is a versatile tool adaptable for a wide range of materials and phenomena, providing accurate and efficient simulations. See more details in our DeePTB paper: [SKTB: arXiv:2307.04638](http://arxiv.org/abs/2307.04638), [E3TB: arXiv:2407.06053](https://arxiv.org/pdf/2407.06053)
 
 .. toctree::
    :maxdepth: 2
    :caption: Quick Start
+   
 
    quick_start/easy_install
    quick_start/input
-   quick_start/hands_on
+   quick_start/hands_on/index
    quick_start/basic_api
 
 
@@ -31,10 +48,8 @@ These capabilities make **DeePTB** adaptable to various research scenarios, exte
 .. toctree::
    :maxdepth: 2
    :caption: Advanced
-
-   advanced/dptb_env
-   advanced/soc
-   advanced/nrl_tb
+   
+   advanced/sktb/index
    advanced/elec_properties/index
    advanced/interface/index
 
diff --git a/docs/quick_start/hands_on/e3tb_hands_on.md b/docs/quick_start/hands_on/e3tb_hands_on.md
new file mode 100644
index 00000000..9dc8051e
--- /dev/null
+++ b/docs/quick_start/hands_on/e3tb_hands_on.md
@@ -0,0 +1,94 @@
+# Bulk Silicon model
+
+DeePTB supports training an E3-equalvariant model to predict DFT Hamiltonian, density and overlap matrix under LCAO basis. Here, cubic-phase bulk silicon has been chosen as a quick start example.
+
+Silicon is a chemical element; it has the symbol Si and atomic number 14. It is a hard, brittle crystalline solid with a blue-grey metallic lustre, and is a tetravalent metalloid and semiconductor. The prepared files are located in:
+
+```
+deeptb/examples/e3/
+|-- data
+|   |-- Si64.0
+|   |   |-- atomic_numbers.dat
+|   |   |-- basis.dat
+|   |   |-- cell.dat
+|   |   |-- hamiltonians.h5
+|   |   |-- kpoints.npy
+|   |   |-- overlaps.h5
+|   |   |-- pbc.dat
+|   |   `-- positions.dat
+|   `-- info.json
+`-- input.json
+```
+We prepared one frame of silicon cubic bulk structure as an example. The data was computed using DFT software ABACUS, with an LCAO basis set containing 1 `s` and 1 `p` orbital. The cutoff radius for the orbital is 7au, which means the largest bond would be less than 14 au. Therefore, the r_max should be set as 7.4. So we have an info.json file like:
+
+```json
+{
+        "nframes": 1,
+        "pos_type": "cart",
+        "AtomicData_options": {
+                "r_max": 7.4,
+                "pbc": true
+        }
+}
+```
+
+The `input_short.json` file contains the least number of parameters that are required to start training the **E3TB** model, we list some important parameters:
+```json
+"common_options": {
+    "basis": {
+        "Si": "1s1p"
+    },
+    "device": "cpu",
+    "overlap": true
+}
+```
+In `common_options`, here are the essential parameters. The `basis` should align with the DFT calculation, so 1 `s` and 1 `p` orbital would result in a `1s1p` basis. The `device` can either be `cpu` or `cuda`, but we highly recommend using `cuda` if GPU is available. The `overlap` tag controls whether to fit the overlap matrix together. Benefitting from our parameterization, the fitting overlap only brings negelectable costs, but would boost the convenience when using the model.
+
+Here comes the `model_options`:
+```json
+"model_options": {
+    "embedding": {
+        "method": "slem",
+        "r_max": {
+            "Si": 7.4
+        },
+        "irreps_hidden": "32x0e+32x1o+16x2e",
+        "n_layers": 3,
+        "avg_num_neighbors": 51,
+        "tp_radial_emb": true
+    },
+    "prediction":{
+        "method": "e3tb",
+        "neurons": [64,64]
+    }
+}
+```
+The `model_options` contains `embedding` and `prediction` parts, denoting the construction of representation for equivariant features, and arranging and rescaling the features into quantum operators sub-blocks such as Hamiltonian, density and overlap matrix.
+
+In `embedding`, the `method` supports `slem` and `lem` for now, where `slem` has a strictly localized dependency, which has better transferability and data efficiency, while `lem` has an adjustable semi-local dependency, which has better representation capacity, but would require a little more data. `r_max` should align with the one defined in `info.json`.
+
+For `irreps_hidden`, this parameter defines the size of the hidden equivariant irreducible representation, which is highly related to the power of the model. There are certain rules to define this param. First, we should check the largest angular momentum defined in the DFT LCAO basis, the irreps's highest angular momentum should always be double. For example, for `1s1p` basis, the irreps should contain features with angular momentum from 0 to 2, which is 2 times 1, the angular momentum of `p` orbital. If the basis contains `d` orbital, then the irreps should contain angular momentum up to 4. `f` and `g` or even higher orbitals are also supported.
+
+In `prediction`, we should use the `e3tb` method to let the model know the output features are arranged in **E3TB** format. The neurons are defined for a simple MLP to predict the slater-koster-like parameters for predicting the overlap matrix, for which [64,64] is usually fine.
+
+
+Now everything is prepared! We can using the following command and we can train the first model:
+
+```bash
+cd deeptb/examples/e3
+dptb train ./input/input_short.json -o ./e3_silicon
+```
+
+Here ``-o`` indicate the output directory. During the fitting procedure, we can see the loss curve of hBN is decrease consistently. When finished, we get the fitting results in folders ```e3_silicon```.
+
+By modify the checkpoint path in the script `plot_band.py` and running it, the band structure can be obtained in `./band_plot`:
+```bash
+python plot_band.py
+```
+or just using the command line 
+```bash
+dptb run ./run/band.json -i ./e3_silicon/checkpoint/nnenv.best.pth -o ./band_plot
+```
+![band_first](../../img/silicon_e3_band.png)
+
+Now you know how to train a **E3TB** model for Hamiltonian and overlap matrix. For better usage, we encourage the user to read the full input parameters for the **E3TB** model. Also, the **DeePTB** model supports several post-process tools, and the user can directly extract any predicted properties just using a few lines of code. Please see the basis_api for details.
\ No newline at end of file
diff --git a/docs/quick_start/hands_on/index.rst b/docs/quick_start/hands_on/index.rst
new file mode 100644
index 00000000..ab17a284
--- /dev/null
+++ b/docs/quick_start/hands_on/index.rst
@@ -0,0 +1,7 @@
+=================================================
+A quick Example
+=================================================
+
+.. toctree::
+   sktb_hands_on
+   e3tb_hands_on
diff --git a/docs/quick_start/hands_on.md b/docs/quick_start/hands_on/sktb_hands_on.md
similarity index 99%
rename from docs/quick_start/hands_on.md
rename to docs/quick_start/hands_on/sktb_hands_on.md
index c5081600..87e4f94d 100644
--- a/docs/quick_start/hands_on.md
+++ b/docs/quick_start/hands_on/sktb_hands_on.md
@@ -1,6 +1,4 @@
-# A quick Example
-
-## h-BN model
+# h-BN model
 
 DeePTB is a package that utilizes machine-learning method to train TB models for target systems with the DFT training data. Here, h-BN monolayer has been chosen as a quick start example. 
 
diff --git a/docs/quick_start/input.md b/docs/quick_start/input.md
index 1fd7e118..336bd595 100644
--- a/docs/quick_start/input.md
+++ b/docs/quick_start/input.md
@@ -4,7 +4,10 @@ The following files are the central input files for DeePTB. Before executing the
 
 ## Inputs
 ### Data
-The dataset of one structure is recommended to be prepared in the following format:
+The dataset files contrains both the **atomic structure** and the **training label** information. 
+
+The **atomic structure** should be prepared as a ASE trajectory binary file, where each structure is stored using an **Atom** class defined in ASE package. The provided trajectory file must have suffix `.traj` and the length of the trajectory is `nframes`. For labels, we currently support `eigenvalues`, `Hamiltonian`, `density matrix` and `overlap matrix`. For training a **SKTB** model, we need to prepare the `eigenvalues` label, which contrains the `eigenvalues.npy` and `kpoints.npy`. A typical dataset of **SKTB** task looks like:
+
 ```
 data/
 -- set.x
@@ -13,44 +16,64 @@ data/
 -- -- xdat.traj        # ase trajectory file with nframes
 -- -- info.json        # defining the parameters used in building AtomicData graph data
 ```
-One should prepare the **atomic structures** and **electronic band structures**. The **atomic structures** data is in ASE trajectory binary format, where each structure is stored using an **Atom** class defined in ASE package. The provided trajectory file must have suffix `.traj` and the length of the trajectory is `nframes`.
-> We also support another format to provide structure information, instead of loading structures from a single binary `.traj` file. In this way, three seperate textfiles for **atomic structures** need to be provided: `atomic_numbers.dat`, `cell.dat` and `positions.dat`. The length unit used in `cell.dat` and `positions.dat` (if cartesian coordinates) is Angstrom.
+
+> We also support another format to provide structure information, instead of loading structures from a single binary `.traj` file. In this way, three seperate textfiles for **atomic structures** need to be provided: `atomic_numbers.dat`, `pbc.dat`, `cell.dat` and `positions.dat`. The length unit used in `cell.dat` and `positions.dat` (if cartesian coordinates) is Angstrom.
 
 The **band structures** data includes the kpoints list and eigenvalues in the binary format of `.npy`. The shape of kpoints data is fixed as **[nkpoints,3]** and eigenvalues is fixed as **[nframes,nkpoints,nbands]**. The `nframes` here must be the same as in **atomic structures** files.
 
 > **Important:** The eigenvalues.npy should not contain bands that contributed by the core electrons, which is not setted as the TB orbitals in model setting.
 
+For typical **E3TB** task, we need to prepare the Hamiltonian/density matrix along with overlap matrix as labels. They are arranged as hdf5 binary format, and named as `hamiltonians.h5`/`density_matrices.h5` and `overlaps.h5` respectively. A typical dataset of **E3TB** looks like:
+
+```
+data/
+-- set.x
+-- -- positions.dat     # a text file with nframe x natom row and 3 col
+-- -- pbc.dat           # a text file of three bool variables
+-- -- cell.dat          # a text file with nframe x 3 row and 3 col, or 3 rol and 3 col.
+-- -- atomic_numbers.dat    # a text file with nframe x natom row and 1 col
+-- -- hamiltonian.h5    # a hdf5 dataset file with group named "0", "1", ..., "nframe". Each group contains a dict of {"i_j_Rx_Ry_Rz": numpy.ndarray} 
+-- -- overlaps.h5       # a hdf5 dataset file with group named "0", "1", ..., "nframe". Each group contains a dict of {"i_j_Rx_Ry_Rz": numpy.ndarray} 
+-- -- info.json
+```
+
 ### Info
 
 In **DeePTB**, the **atomic structures** and **band structures** data are stored in AtomicData graph structure. `info.json` defines the key parameters used in building AtomicData graph dataset, which looks like:
 ```bash
 {
     "nframes": 1,
-    "natoms": 2, # optional
-    "pos_type": "ase",
+    "pos_type": "ase/cart/frac",
     "AtomicData_options": {
         "r_max": 5.0,
         "er_max": 5.0, # optional
         "oer_max": 2.5, # optional
-        "pbc": true # optional, default to be true
-    },
-    "bandinfo": {
-        "band_min": 0,
-        "band_max": 6,
-        "emin": null, # optional
-        "emax": null # optional
     }
 }
 ```
-`nframes` is the length of the trajectory, as we defined in the previous section. `natoms` is the number of atoms in each of the structures in the trajectory. `pos_type` defines the input format of the **atomic structures**, which is set to `ase` if  ASE `.traj` file is provided, and `cart` or `frac` if cartesian / fractional coordinate in `positions.dat` file provided.
+`nframes` is the length of the trajectory, as we defined in the previous section. `pos_type` defines the input format of the **atomic structures**, which is set to `ase` if  ASE `.traj` file is provided, and `cart` or `frac` if cartesian / fractional coordinate in `positions.dat` file provided.
 
-In the `AtomicData_options` section, the key arguments in defining graph structure is provided. `r_max` is the maximum cutoff in building neighbour list for each atom, and `pbc` assigns the PBC condition in cell. `er_max` and `oer_max` are optional value for additional environmental dependence TB parameterization, such as strain correction and `nnenv`. All cutoff variables have the unit of Angstrom.
+In the `AtomicData_options` section, the key arguments in defining graph structure is provided. `r_max` is the maximum cutoff in building neighbour list for each atom. `er_max` and `oer_max` are optional value for additional environmental dependence TB parameterization in **SKTB** mode, such as strain correction and `nnenv`. All cutoff variables have the unit of Angstrom.
 
-We can get the recommended `r_max` value by `DeePTB`'s bond analysis function, using:
+For **SKTB**, We can get the recommended `r_max` value by `DeePTB`'s bond analysis function, using:
 ```bash
 dptb bond <structure path> [[-c] <cutoff>] [[-acc] <accuracy>]
 ```
 
+For **E3TB**, we suggest the user align the `r_max` value to the LCAO basis's cutoff radius used in DFT calculation.
+
+For **SKTB** model, we should also specify the parameters in `info.json` that controls the fitting eigenvalues:
+```JSON
+{
+    "bandinfo": {
+        "band_min": 0,
+        "band_max": 6,
+        "emin": null, # optional
+        "emax": null # optional
+    }
+}
+```
+
 `bandinfo` defines the fitting target. The `emin` and `emax` defines the fitting energy window of the band, while the `band_min` and `band_max` select which band are targeted.
 > **note:** The `0` energy point is located at the lowest energy eigenvalues of the data files, to generalize bandstructure data computed by different DFT packages.
 
@@ -58,65 +81,86 @@ dptb bond <structure path> [[-c] <cutoff>] [[-acc] <accuracy>]
 ### Input.json
 **DeePTB** provides input config templates for quick setup. User can run:
 ```bash
-dptb config -tr PATH
+dptb config -tr [[-e3] <e3tb>] [[-sk] <sktb>] [[-skenv] <sktbenv>] PATH
 ```
 The template config file will be generated at the `PATH`.
 We provide several template for different mode of deeptb, please run `dptb config -h` to checkout.
 
-`common_options` provides vital information to build a **DeePTB** models. 
-
-```json
-"common_options": {
-            "basis": {
-                "C": ["2s", "2p"],
-                "N": ["2s", "2p", "d*"]
-            },
-            "device": "cpu",
-            "dtype": "float32",
-            "seed": 42
-    }
-```
+In general, the `input.json` file contains following parts:
 
-`train_options` section is used to spicify the training procedure.
+- `common_options` provides vital information to build a **DeePTB** models. 
 
-```json
-"train_options": {
-    "num_epoch": 500,
-    "batch_size": 1,
-    "optimizer": {
-        "lr": 0.05,
-        "type": "Adam"
-    },
+    ```json
+    "common_options": {
+                "basis": {
+                    "C": ["2s", "2p"],
+                    "N": ["2s", "2p", "d*"]
+                },
+                "device": "cpu",
+                "overlap": false,
+                "dtype": "float32",
+                "seed": 42
+        }
+    ```
+    Here the example basis defines the minimal basis set in **SKTB** mode. The user can define the **E3TB** mode basis with similar format, but a string instead a list. For example, for `C` and `N` with DZP basis, the `basis` should be defined as：
+    ```json
+    "basis": {
+        "C": "2s2p1d",
+        "N": "2s2p1d"
+        }
+    ```
+- `train_options` section is used to spicify the training procedure.
+
+    ```json
+    "train_options": {
+        "num_epoch": 500,
+        "batch_size": 1,
+        "optimizer": {
+            "lr": 0.05,
+            "type": "Adam"
+        },
+        "lr_scheduler": {
+            "type": "exp",
+            "gamma": 0.999
+        },
+        "loss_options":{
+            "train": {"method": "eigvals"}
+        },
+        "save_freq": 10,
+        "validation_freq": 10,
+        "display_freq": 10
+    }
+    ```
+    Here `Adam` optimizer is always preferred for better convergence speed. While the `lr_scheduler` are recommended to use "rop", as:
+    ```json
     "lr_scheduler": {
-        "type": "exp",
-        "gamma": 0.999
-    },
-    "loss_options":{
-        "train": {"method": "eigvals"}
-    },
-    "save_freq": 10,
-    "validation_freq": 10,
-    "display_freq": 10
-}
-```
+            "type": "rop",
+            "factor": 0.8,
+            "patience": 50,
+            "min_lr": 1e-6
+        }
+    ```
+    More details about rop is available at: https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.ReduceLROnPlateau.html
+
+- `model_options` section contains the key setting to build **DeePTB** models. 
 
-`model_options` section contains the key setting to build **DeePTB** models. For a Slater-Kohster TB parameteriation model, only the `nnsk` section is needed. The example of a `nnsk` model is:
+    For **SKTB** model without env correction, only the `nnsk` section is needed. The example of a `nnsk` model is:
 
-```json
-"model_options": {
-    "nnsk": {
-        "onsite": {"method": "uniform"},
-        "hopping": {"method": "powerlaw", "rs":2.6, "w": 0.3},
-        "freeze": false
+    ```json
+    "model_options": {
+        "nnsk": {
+            "onsite": {"method": "uniform"},
+            "hopping": {"method": "powerlaw", "rs":2.6, "w": 0.3},
+            "freeze": false
+        }
     }
-}
-```
+    ```
 
-Different method of `onsite` and `hopping` have their specific parameters requirements, please checkout our full parameter lists.
+    Different method of `onsite` and `hopping` have their specific parameters requirements, please checkout our full parameter lists.
 
-For a environment-dependent **DeePTB** model, the `embedding`, `prediction` and `nnsk` sections are required as in this example:
+    For **SKTB** model with environment dependency, the `embedding`, `prediction` and `nnsk` sections are required as in this example:
 
-```json
+    ```json
     "model_options": {
         "embedding":{
             "method": "se2", "rs": 2.5, "rc": 5.0,
@@ -134,28 +178,73 @@ For a environment-dependent **DeePTB** model, the `embedding`, `prediction` and
             "freeze": true
         }
     }
-```
+    ```
 
-`data_options` assigns the datasets used in training.
-
-```json
-"data_options": {
-    "train": {
-        "root": "./data/",
-        "prefix": "kpathmd100",
-        "get_eigenvalues": true
-    },
-    "test": {
-        "root": "./data/",
-        "prefix": "kpathmd100",
-        "get_eigenvalues": true
+    For **E3TB** model, only `embedding` and `prediction` is required, as:
+    ```json
+    "model_options": {
+        "embedding": {
+            "method": "slem/lem", # s in slem stands for strict localization
+            "r_max": {
+                "C": 7.0,
+                "N": 7.0
+            },
+            "irreps_hidden": "32x0e+32x1o+16x2e+8x3o+8x4e+4x5o",
+            "n_layers": 3,
+            "n_radial_basis": 18,
+            "env_embed_multiplicity": 10,
+            "avg_num_neighbors": 51,
+            "latent_dim": 64,
+            "latent_channels": [
+                32
+            ],
+            "tp_radial_emb": true,
+            "tp_radial_channels": [
+                32
+            ],
+            "PolynomialCutoff_p": 6,
+            "cutoff_type": "polynomial",
+            "res_update": true,
+            "res_update_ratios": 0.5,
+            "res_update_ratios_learnable": false
+        },
+        "prediction":{
+            "method": "e3tb",
+            "scales_trainable":false,
+            "shifts_trainable":false,
+            "neurons": [64,64] # optional, required when overlap in common_options is True
+        }
     }
-}
-```
+    ```
+
+- `data_options` assigns the datasets used in training.
+
+    ```json
+    "data_options": {
+        "train": {
+            "type": "DefaultDataset", # optional, default "DefaultDataset"
+            "root": "./data/",
+            "prefix": "kpathmd100",
+            "get_Hamiltonian": false, # optional, default false
+            "get_eigenvalues": true, # optional, default false
+            "get_overlap": false, # optional, default false
+            "get_DM": false # optional, default false
+        },
+        "validation": {
+            "type": "DefaultDataset",
+            "root": "./data/",
+            "prefix": "kpathmd100",
+            "get_Hamiltonian": false,
+            "get_eigenvalues": true,
+            "get_overlap": false,
+            "get_DM": false
+        }
+    }
+    ```
 
 ## Commands
 ### Training
 When data and input config file is prepared, we are ready to train the model:
 ```bash
-dptb train <input config> [[-o] <output directory>] [[-i|-r] <nnsk checkpoint path>]
+dptb train <input config> [[-o] <output directory>] [[-i|-r] <deeptb checkpoint path>]
 ```
diff --git a/dptb/data/AtomicData.py b/dptb/data/AtomicData.py
index e4228f78..cf7447b1 100644
--- a/dptb/data/AtomicData.py
+++ b/dptb/data/AtomicData.py
@@ -305,10 +305,6 @@ def _process_dict(kwargs, ignore_fields=[]):
             if num_frames > 1 and v.size(0) != num_frames:
                 raise ValueError(f"Wrong shape for NESTED property {k}")
 
-                
-    
-
-
 
 class AtomicData(Data):
     """A neighbor graph for points in (periodic triclinic) real space.
diff --git a/dptb/data/dataset/_default_dataset.py b/dptb/data/dataset/_default_dataset.py
index b56d64a5..1a18480d 100644
--- a/dptb/data/dataset/_default_dataset.py
+++ b/dptb/data/dataset/_default_dataset.py
@@ -353,7 +353,7 @@ def raw_dir(self):
         # TODO: this is not implemented.
         return self.root
     
-    def E3statistics(self, decay=False):
+    def E3statistics(self, model: torch.nn.Module=None, decay=False):
         assert self.transform is not None
         idp = self.transform
 
@@ -369,6 +369,19 @@ def E3statistics(self, decay=False):
         stats["node"] =  self._E3nodespecies_stat(typed_dataset=typed_dataset)
         stats["edge"] = self._E3edgespecies_stat(typed_dataset=typed_dataset, decay=decay)
 
+        if model is not None:
+            # initilize the model param with statistics
+            scalar_mask = torch.BoolTensor([ir.dim==1 for ir in model.idp.orbpair_irreps])
+            node_shifts = stats["node"]["scalar_ave"]
+            node_scales = stats["node"]["norm_ave"]
+            node_scales[:,scalar_mask] = stats["node"]["scalar_std"]
+
+            edge_shifts = stats["edge"]["scalar_ave"]
+            edge_scales = stats["edge"]["norm_ave"]
+            edge_scales[:,scalar_mask] = stats["edge"]["scalar_std"]
+            model.node_prediction_h.set_scale_shift(scales=node_scales, shifts=node_shifts)
+            model.edge_prediction_h.set_scale_shift(scales=edge_scales, shifts=edge_shifts)
+
         return stats
     
     def _E3edgespecies_stat(self, typed_dataset, decay):
diff --git a/dptb/data/interfaces/ham_to_feature.py b/dptb/data/interfaces/ham_to_feature.py
index 3698ec84..cc39b982 100644
--- a/dptb/data/interfaces/ham_to_feature.py
+++ b/dptb/data/interfaces/ham_to_feature.py
@@ -405,6 +405,7 @@ def feature_to_block(data, idp):
                     blocks[block_index] = block.T
                 else:
                     blocks[block_index] += block.T
+                    
     return blocks
 
 
diff --git a/dptb/data/transforms.py b/dptb/data/transforms.py
index 2df77ea8..1a138379 100644
--- a/dptb/data/transforms.py
+++ b/dptb/data/transforms.py
@@ -868,7 +868,15 @@ def get_irreps(self, no_parity=False):
         return self.orbpair_irreps
 
     def get_irreps_sim(self, no_parity=False):
-        return self.get_irreps(no_parity=no_parity).sort[0].simplify()
+        return self.get_irreps(no_parity=no_parity).sort()[0].simplify()
+    
+    def get_irreps_ess(self, no_parity=False):
+        irp_e = []
+        for mul, (l, p) in self.get_irreps_sim(no_parity=no_parity):
+            if (-1)**l == p:
+                irp_e.append((mul, (l, p)))
+
+        return o3.Irreps(irp_e)
     
     def __eq__(self, other):
         return self.basis == other.basis and self.method == other.method
\ No newline at end of file
diff --git a/dptb/entrypoints/run.py b/dptb/entrypoints/run.py
index 0bcb4f5f..ce895442 100644
--- a/dptb/entrypoints/run.py
+++ b/dptb/entrypoints/run.py
@@ -9,7 +9,7 @@
 from dptb.utils.argcheck import normalize_run
 from dptb.utils.tools import j_loader
 from dptb.utils.tools import j_must_have
-from dptb.postprocess.write_ham import write_ham
+from dptb.postprocess.write_block import write_block
 import torch
 import h5py
 
@@ -88,7 +88,7 @@ def run(
 
     elif task=='write_block':
         task = torch.load(init_model, map_location="cpu")["task"]
-        block = write_ham(data=struct_file, AtomicData_options=jdata['AtomicData_options'], model=model, device=jdata["device"])
+        block = write_block(data=struct_file, AtomicData_options=jdata['AtomicData_options'], model=model, device=jdata["device"])
         # write to h5 file, block is a dict, write to a h5 file
         with h5py.File(os.path.join(results_path, task+".h5"), 'w') as fid:
             default_group = fid.create_group("0")
diff --git a/dptb/entrypoints/train.py b/dptb/entrypoints/train.py
index a09de89e..958dec4b 100644
--- a/dptb/entrypoints/train.py
+++ b/dptb/entrypoints/train.py
@@ -183,7 +183,8 @@ def train(
         # include the init model and from scratch
         # build model will handle the init model cases where the model options provided is not equals to the ones in checkpoint.
         checkpoint = init_model if init_model else None
-        model = build_model(checkpoint=checkpoint, model_options=jdata["model_options"], common_options=jdata["common_options"], statistics=train_datasets.E3statistics())
+        model = build_model(checkpoint=checkpoint, model_options=jdata["model_options"], common_options=jdata["common_options"])
+        train_datasets.E3statistics(model=model)
         trainer = Trainer(
             train_options=jdata["train_options"],
             common_options=jdata["common_options"],
diff --git a/dptb/nn/build.py b/dptb/nn/build.py
index 27d0c540..c5c6035f 100644
--- a/dptb/nn/build.py
+++ b/dptb/nn/build.py
@@ -11,8 +11,7 @@
 def build_model(
         checkpoint: str=None,
         model_options: dict={}, 
-        common_options: dict={}, 
-        statistics: dict=None
+        common_options: dict={}
         ):
     """
     The build model method should composed of the following steps:
@@ -141,23 +140,8 @@ def build_model(
     if from_scratch:
         if init_nnenv:
             model = NNENV(**model_options, **common_options)
-
-            # do initialization from statistics if NNENV is e3tb and statistics is provided
-            if model.method == "e3tb" and statistics is not None:
-                scalar_mask = torch.BoolTensor([ir.dim==1 for ir in model.idp.orbpair_irreps])
-                node_shifts = statistics["node"]["scalar_ave"]
-                node_scales = statistics["node"]["norm_ave"]
-                node_scales[:,scalar_mask] = statistics["node"]["scalar_std"]
-
-                edge_shifts = statistics["edge"]["scalar_ave"]
-                edge_scales = statistics["edge"]["norm_ave"]
-                edge_scales[:,scalar_mask] = statistics["edge"]["scalar_std"]
-                model.node_prediction_h.set_scale_shift(scales=node_scales, shifts=node_shifts)
-                model.edge_prediction_h.set_scale_shift(scales=edge_scales, shifts=edge_shifts)
-
         elif init_nnsk:
             model = NNSK(**model_options["nnsk"], **common_options)
-
         elif init_mixed:
             model = MIX(**model_options, **common_options)
         elif init_dftbsk:
diff --git a/dptb/nn/embedding/lem.py b/dptb/nn/embedding/lem.py
index 82791203..bad160f3 100644
--- a/dptb/nn/embedding/lem.py
+++ b/dptb/nn/embedding/lem.py
@@ -513,7 +513,7 @@ def forward(self, edge_index, atom_type, bond_type, edge_sh, edge_length, node_o
             cutoff_coeffs[active_edges].unsqueeze(-1) * new_latents
             )
         
-        weights_e = self.env_embed_mlp(latents)
+        weights_e = self.env_embed_mlp(latents[prev_mask])
         # features = self.bn(features)
 
         edge_features = self._env_weighter(
@@ -677,7 +677,7 @@ def forward(self, latents, node_features, edge_features, atom_type, node_onehot,
         message = self.tp(
             torch.cat(
                 [new_node_features[edge_center[active_edges]], self.sln_e(edge_features)]
-                , dim=-1), edge_vector[active_edges], latents) # full_out_irreps
+                , dim=-1), edge_vector[active_edges], latents[active_edges]) # full_out_irreps
         
         message = self.activation(message)
         message = self.lin_post(message)
@@ -856,7 +856,7 @@ def forward(self, latents, node_features, node_onehot, edge_features, edge_index
                     self.sln_e(edge_features),
                     new_node_features[edge_neighbor[active_edges]]
                     ]
-                , dim=-1), edge_vector[active_edges], latents) # full_out_irreps
+                , dim=-1), edge_vector[active_edges], latents[active_edges]) # full_out_irreps
         
         scalars = new_edge_features[:, :self.tp.irreps_out[0].dim]
         assert len(scalars.shape) == 2
@@ -885,10 +885,17 @@ def forward(self, latents, node_features, node_onehot, edge_features, edge_index
             coefficient_old = torch.rsqrt(update_coefficients.square() + 1)
             coefficient_new = update_coefficients * coefficient_old
             edge_features = coefficient_new * new_edge_features + coefficient_old * self.linear_res(edge_features)
-            latents = coefficient_new * new_latents + coefficient_old * latents
+
+            latents = torch.index_copy(
+                latents, 0, active_edges, 
+                coefficient_new * new_latents + coefficient_old * latents[active_edges]
+            )
         else:
             edge_features = new_edge_features
-            latents = new_latents
+            latents = torch.index_copy(
+                latents, 0, active_edges, 
+                new_latents
+            )
 
         return edge_features, latents
     
diff --git a/dptb/nn/embedding/slem.py b/dptb/nn/embedding/slem.py
index 4753e38b..328a383c 100644
--- a/dptb/nn/embedding/slem.py
+++ b/dptb/nn/embedding/slem.py
@@ -519,7 +519,7 @@ def forward(self, edge_index, atom_type, bond_type, edge_sh, edge_length, node_o
             )
         
         weights_h = self.env_embed_mlp(new_latents)
-        weights_e = self.env_embed_mlp(latents)
+        weights_e = self.env_embed_mlp(latents[prev_mask])
 
         # embed initial edge
         hidden_features = self._env_weighter(
@@ -700,7 +700,7 @@ def forward(self, latents, node_features, hidden_features, atom_type, node_oneho
         message = self.tp(
             torch.cat(
                 [new_node_features[edge_center[active_edges]], hidden_features]
-                , dim=-1), edge_vector[active_edges], latents) # full_out_irreps
+                , dim=-1), edge_vector[active_edges], latents[active_edges]) # full_out_irreps
         
         message = self.activation(message)
         message = self.lin_post(message)
@@ -866,7 +866,7 @@ def forward(self, latents, node_features, hidden_features, edge_features, edge_i
                     hidden_features,
                     node_features[edge_neighbor[active_edges]]
                     ]
-                , dim=-1), edge_vector[active_edges], latents) # full_out_irreps
+                , dim=-1), edge_vector[active_edges], latents[active_edges]) # full_out_irreps
         
         scalars = new_edge_features[:, :self.tp.irreps_out[0].dim]
         assert len(scalars.shape) == 2
@@ -1027,7 +1027,7 @@ def forward(self, latents, node_features, hidden_features, node_onehot, edge_ind
                     node_features[edge_center[active_edges]],
                     new_hidden_features
                     ]
-                , dim=-1), edge_vector[active_edges], latents)
+                , dim=-1), edge_vector[active_edges], latents[active_edges])
         
         
         new_hidden_features = self.activation(new_hidden_features)
@@ -1055,10 +1055,18 @@ def forward(self, latents, node_features, hidden_features, node_onehot, edge_ind
             coefficient_old = torch.rsqrt(update_coefficients.square() + 1)
             coefficient_new = update_coefficients * coefficient_old
             hidden_features = coefficient_new * new_hidden_features + coefficient_old * self.linear_res(hidden_features)
-            latents = coefficient_new * new_latents + coefficient_old * latents
+
+            latents = torch.index_copy(
+                latents, 0, active_edges, 
+                coefficient_new * new_latents + coefficient_old * latents[active_edges]
+            )
+
         else:
             hidden_features = new_hidden_features
-            latents = new_latents
+            latents = torch.index_copy(
+                latents, 0, active_edges, 
+                new_latents
+            )
         
         return hidden_features, latents
 
diff --git a/dptb/nn/energy.py b/dptb/nn/energy.py
index c6b721c4..80cdd681 100644
--- a/dptb/nn/energy.py
+++ b/dptb/nn/energy.py
@@ -56,13 +56,19 @@ def __init__(
 
 
     def forward(self, data: AtomicDataDict.Type, nk: Optional[int]=None) -> AtomicDataDict.Type:
-        num_k = data[AtomicDataDict.KPOINT_KEY][0].shape[0]
-        kpoints = data[AtomicDataDict.KPOINT_KEY][0] # slice the first dimension, since it is nested tensor by default
+        kpoints = data[AtomicDataDict.KPOINT_KEY]
+        if kpoints.is_nested:
+            nested = True
+            assert kpoints.size(0) == 1
+            kpoints = kpoints[0]
+        else:
+            nested = False
+        num_k = kpoints.shape[0]
         eigvals = []
         if nk is None:
             nk = num_k
         for i in range(int(np.ceil(num_k / nk))):
-            data[AtomicDataDict.KPOINT_KEY] = torch.nested.as_nested_tensor([kpoints[i*nk:(i+1)*nk]])
+            data[AtomicDataDict.KPOINT_KEY] = kpoints[i*nk:(i+1)*nk]
             data = self.h2k(data)
             if self.overlap:
                 data = self.s2k(data)
@@ -74,5 +80,9 @@ def forward(self, data: AtomicDataDict.Type, nk: Optional[int]=None) -> AtomicDa
             
             eigvals.append(torch.linalg.eigvalsh(data[self.h_out_field]))
         data[self.out_field] = torch.nested.as_nested_tensor([torch.cat(eigvals, dim=0)])
+        if nested:
+            data[AtomicDataDict.KPOINT_KEY] = torch.nested.as_nested_tensor([kpoints])
+        else:
+            data[AtomicDataDict.KPOINT_KEY] = kpoints
 
         return data
\ No newline at end of file
diff --git a/dptb/nn/hr2hk.py b/dptb/nn/hr2hk.py
index b3d06cc2..7b443364 100644
--- a/dptb/nn/hr2hk.py
+++ b/dptb/nn/hr2hk.py
@@ -56,6 +56,10 @@ def forward(self, data: AtomicDataDict.Type) -> AtomicDataDict.Type:
         bondwise_hopping.to(self.device)
         bondwise_hopping.type(self.dtype)
         onsite_block = torch.zeros((len(data[AtomicDataDict.ATOM_TYPE_KEY]), self.idp.full_basis_norb, self.idp.full_basis_norb,), dtype=self.dtype, device=self.device)
+        kpoints = data[AtomicDataDict.KPOINT_KEY]
+        if kpoints.is_nested:
+            assert kpoints.size(0) == 1
+            kpoints = kpoints[0]
 
         soc = data.get(AtomicDataDict.NODE_SOC_SWITCH_KEY, False)
         if isinstance(soc, torch.Tensor):
@@ -111,7 +115,7 @@ def forward(self, data: AtomicDataDict.Type) -> AtomicDataDict.Type:
 
         # R2K procedure can be done for all kpoint at once.
         all_norb = self.idp.atom_norb[data[AtomicDataDict.ATOM_TYPE_KEY]].sum()
-        block = torch.zeros(data[AtomicDataDict.KPOINT_KEY][0].shape[0], all_norb, all_norb, dtype=self.ctype, device=self.device)
+        block = torch.zeros(kpoints.shape[0], all_norb, all_norb, dtype=self.ctype, device=self.device)
         # block = torch.complex(block, torch.zeros_like(block))
         # if data[AtomicDataDict.NODE_SOC_SWITCH_KEY].all():
         #     block_uu = torch.zeros(data[AtomicDataDict.KPOINT_KEY].shape[0], all_norb, all_norb, dtype=self.ctype, device=self.device)
@@ -149,13 +153,13 @@ def forward(self, data: AtomicDataDict.Type) -> AtomicDataDict.Type:
             masked_hblock = hblock[imask][:,jmask]
 
             block[:,iatom_indices,jatom_indices] += masked_hblock.squeeze(0).type_as(block) * \
-                torch.exp(-1j * 2 * torch.pi * (data[AtomicDataDict.KPOINT_KEY][0] @ data[AtomicDataDict.EDGE_CELL_SHIFT_KEY][i])).reshape(-1,1,1)
+                torch.exp(-1j * 2 * torch.pi * (kpoints @ data[AtomicDataDict.EDGE_CELL_SHIFT_KEY][i])).reshape(-1,1,1)
 
         block = block + block.transpose(1,2).conj()
         block = block.contiguous()
         
         if soc:
-            HK_SOC = torch.zeros(data[AtomicDataDict.KPOINT_KEY][0].shape[0], 2*all_norb, 2*all_norb, dtype=self.ctype, device=self.device)
+            HK_SOC = torch.zeros(kpoints.shape[0], 2*all_norb, 2*all_norb, dtype=self.ctype, device=self.device)
             #HK_SOC[:,:all_norb,:all_norb] = block + block_uu
             #HK_SOC[:,:all_norb,all_norb:] = block_ud
             #HK_SOC[:,all_norb:,:all_norb] = block_ud.conj()
diff --git a/dptb/nnops/loss.py b/dptb/nnops/loss.py
index 7eb6b808..f52321c0 100644
--- a/dptb/nnops/loss.py
+++ b/dptb/nnops/loss.py
@@ -243,59 +243,59 @@ def forward(
 
         return total_loss / len(datalist)
 
-@Loss.register("hamil")
-class HamilLoss(nn.Module):
-    def __init__(
-            self, 
-            basis: Dict[str, Union[str, list]]=None,
-            idp: Union[OrbitalMapper, None]=None,
-            overlap: bool=False,
-            dtype: Union[str, torch.dtype] = torch.float32, 
-            device: Union[str, torch.device] = torch.device("cpu"),
-            **kwargs,
-        ):
-
-        super(HamilLoss, self).__init__()
-        self.loss1 = nn.L1Loss()
-        self.loss2 = nn.MSELoss()
-        self.overlap = overlap
-        self.device = device
-
-        if basis is not None:
-            self.idp = OrbitalMapper(basis, method="e3tb", device=self.device)
-            if idp is not None:
-                assert idp == self.idp, "The basis of idp and basis should be the same."
-        else:
-            assert idp is not None, "Either basis or idp should be provided."
-            self.idp = idp
-
-    def forward(self, data: AtomicDataDict, ref_data: AtomicDataDict):
-        # mask the data
-
-        # data[AtomicDataDict.NODE_FEATURES_KEY].masked_fill(~self.idp.mask_to_nrme[data[AtomicDataDict.ATOM_TYPE_KEY]], 0.)
-        # data[AtomicDataDict.EDGE_FEATURES_KEY].masked_fill(~self.idp.mask_to_erme[data[AtomicDataDict.EDGE_TYPE_KEY]], 0.)
-
-        node_mean = ref_data[AtomicDataDict.NODE_FEATURES_KEY].mean(dim=-1, keepdim=True)
-        edge_mean = ref_data[AtomicDataDict.EDGE_FEATURES_KEY].mean(dim=-1, keepdim=True)
-        node_weight = 1/((ref_data[AtomicDataDict.NODE_FEATURES_KEY]-node_mean).norm(dim=-1, keepdim=True)+1e-5)
-        edge_weight = 1/((ref_data[AtomicDataDict.EDGE_FEATURES_KEY]-edge_mean).norm(dim=-1, keepdim=True)+1e-5)
+# @Loss.register("hamil")
+# class HamilLoss(nn.Module):
+#     def __init__(
+#             self, 
+#             basis: Dict[str, Union[str, list]]=None,
+#             idp: Union[OrbitalMapper, None]=None,
+#             overlap: bool=False,
+#             dtype: Union[str, torch.dtype] = torch.float32, 
+#             device: Union[str, torch.device] = torch.device("cpu"),
+#             **kwargs,
+#         ):
+
+#         super(HamilLoss, self).__init__()
+#         self.loss1 = nn.L1Loss()
+#         self.loss2 = nn.MSELoss()
+#         self.overlap = overlap
+#         self.device = device
+
+#         if basis is not None:
+#             self.idp = OrbitalMapper(basis, method="e3tb", device=self.device)
+#             if idp is not None:
+#                 assert idp == self.idp, "The basis of idp and basis should be the same."
+#         else:
+#             assert idp is not None, "Either basis or idp should be provided."
+#             self.idp = idp
+
+#     def forward(self, data: AtomicDataDict, ref_data: AtomicDataDict):
+#         # mask the data
+
+#         # data[AtomicDataDict.NODE_FEATURES_KEY].masked_fill(~self.idp.mask_to_nrme[data[AtomicDataDict.ATOM_TYPE_KEY]], 0.)
+#         # data[AtomicDataDict.EDGE_FEATURES_KEY].masked_fill(~self.idp.mask_to_erme[data[AtomicDataDict.EDGE_TYPE_KEY]], 0.)
+
+#         node_mean = ref_data[AtomicDataDict.NODE_FEATURES_KEY].mean(dim=-1, keepdim=True)
+#         edge_mean = ref_data[AtomicDataDict.EDGE_FEATURES_KEY].mean(dim=-1, keepdim=True)
+#         node_weight = 1/((ref_data[AtomicDataDict.NODE_FEATURES_KEY]-node_mean).norm(dim=-1, keepdim=True)+1e-5)
+#         edge_weight = 1/((ref_data[AtomicDataDict.EDGE_FEATURES_KEY]-edge_mean).norm(dim=-1, keepdim=True)+1e-5)
         
-        pre = (node_weight*(data[AtomicDataDict.NODE_FEATURES_KEY]-node_mean))[self.idp.mask_to_nrme[data[AtomicDataDict.ATOM_TYPE_KEY].flatten()]]
-        tgt = (node_weight*(ref_data[AtomicDataDict.NODE_FEATURES_KEY]-node_mean))[self.idp.mask_to_nrme[data[AtomicDataDict.ATOM_TYPE_KEY].flatten()]]
-        onsite_loss = self.loss1(pre, tgt) + torch.sqrt(self.loss2(pre, tgt))
+#         pre = (node_weight*(data[AtomicDataDict.NODE_FEATURES_KEY]-node_mean))[self.idp.mask_to_nrme[data[AtomicDataDict.ATOM_TYPE_KEY].flatten()]]
+#         tgt = (node_weight*(ref_data[AtomicDataDict.NODE_FEATURES_KEY]-node_mean))[self.idp.mask_to_nrme[data[AtomicDataDict.ATOM_TYPE_KEY].flatten()]]
+#         onsite_loss = self.loss1(pre, tgt) + torch.sqrt(self.loss2(pre, tgt))
 
-        pre = (edge_weight*(data[AtomicDataDict.EDGE_FEATURES_KEY]-edge_mean))[self.idp.mask_to_erme[data[AtomicDataDict.EDGE_TYPE_KEY].flatten()]]
-        tgt = (edge_weight*(ref_data[AtomicDataDict.EDGE_FEATURES_KEY]-edge_mean))[self.idp.mask_to_erme[data[AtomicDataDict.EDGE_TYPE_KEY].flatten()]]
-        hopping_loss = self.loss1(pre, tgt) + torch.sqrt(self.loss2(pre, tgt))
+#         pre = (edge_weight*(data[AtomicDataDict.EDGE_FEATURES_KEY]-edge_mean))[self.idp.mask_to_erme[data[AtomicDataDict.EDGE_TYPE_KEY].flatten()]]
+#         tgt = (edge_weight*(ref_data[AtomicDataDict.EDGE_FEATURES_KEY]-edge_mean))[self.idp.mask_to_erme[data[AtomicDataDict.EDGE_TYPE_KEY].flatten()]]
+#         hopping_loss = self.loss1(pre, tgt) + torch.sqrt(self.loss2(pre, tgt))
         
-        if self.overlap:
-            over_mean = ref_data[AtomicDataDict.EDGE_OVERLAP_KEY].mean(dim=-1, keepdim=True)
-            over_weight = 1/((ref_data[AtomicDataDict.EDGE_OVERLAP_KEY]-over_mean).norm(dim=-1, keepdim=True)+1e-5)
-            pre = (over_weight*(data[AtomicDataDict.EDGE_OVERLAP_KEY]-over_mean))[self.idp.mask_to_erme[data[AtomicDataDict.EDGE_TYPE_KEY].flatten()]]
-            tgt = (over_weight*(ref_data[AtomicDataDict.EDGE_OVERLAP_KEY]-over_mean))[self.idp.mask_to_erme[data[AtomicDataDict.EDGE_TYPE_KEY].flatten()]]
-            hopping_loss += self.loss1(pre, tgt) + torch.sqrt(self.loss2(pre, tgt))
+#         if self.overlap:
+#             over_mean = ref_data[AtomicDataDict.EDGE_OVERLAP_KEY].mean(dim=-1, keepdim=True)
+#             over_weight = 1/((ref_data[AtomicDataDict.EDGE_OVERLAP_KEY]-over_mean).norm(dim=-1, keepdim=True)+1e-5)
+#             pre = (over_weight*(data[AtomicDataDict.EDGE_OVERLAP_KEY]-over_mean))[self.idp.mask_to_erme[data[AtomicDataDict.EDGE_TYPE_KEY].flatten()]]
+#             tgt = (over_weight*(ref_data[AtomicDataDict.EDGE_OVERLAP_KEY]-over_mean))[self.idp.mask_to_erme[data[AtomicDataDict.EDGE_TYPE_KEY].flatten()]]
+#             hopping_loss += self.loss1(pre, tgt) + torch.sqrt(self.loss2(pre, tgt))
         
-        return hopping_loss + onsite_loss
+#         return hopping_loss + onsite_loss
     
 
 @Loss.register("hamil_abs")
@@ -333,13 +333,27 @@ def forward(self, data: AtomicDataDict, ref_data: AtomicDataDict):
         # data[AtomicDataDict.EDGE_FEATURES_KEY].masked_fill(~self.idp.mask_to_erme[data[AtomicDataDict.EDGE_TYPE_KEY]], 0.)
 
         if self.onsite_shift:
-            assert data["batch"].max() == 0, "The onsite shift is only supported for batchsize=1."
-            mu = (data[AtomicDataDict.NODE_FEATURES_KEY][self.idp.mask_to_ndiag[data[AtomicDataDict.ATOM_TYPE_KEY].flatten()]] - \
-                  ref_data[AtomicDataDict.NODE_FEATURES_KEY][self.idp.mask_to_ndiag[ref_data[AtomicDataDict.ATOM_TYPE_KEY].flatten()]]).mean()
-            mu = mu.detach()
-            ref_data[AtomicDataDict.NODE_FEATURES_KEY] = ref_data[AtomicDataDict.NODE_FEATURES_KEY] + mu * ref_data[AtomicDataDict.NODE_OVERLAP_KEY]
-            ref_data[AtomicDataDict.EDGE_FEATURES_KEY] = ref_data[AtomicDataDict.EDGE_FEATURES_KEY] + mu * ref_data[AtomicDataDict.EDGE_OVERLAP_KEY]
-        
+            batch = data.get("batch", torch.zeros(data[AtomicDataDict.POSITIONS_KEY].shape[0]))
+            # assert batch.max() == 0, "The onsite shift is only supported for batchsize=1."
+            mu = data[AtomicDataDict.NODE_FEATURES_KEY][self.idp.mask_to_ndiag[data[AtomicDataDict.ATOM_TYPE_KEY].flatten()]] - \
+                  ref_data[AtomicDataDict.NODE_FEATURES_KEY][self.idp.mask_to_ndiag[ref_data[AtomicDataDict.ATOM_TYPE_KEY].flatten()]]
+            if batch.max() == 0: # when batchsize is zero
+                mu = mu.mean().detach()
+                ref_data[AtomicDataDict.NODE_FEATURES_KEY] = ref_data[AtomicDataDict.NODE_FEATURES_KEY] + mu * ref_data[AtomicDataDict.NODE_OVERLAP_KEY]
+                ref_data[AtomicDataDict.EDGE_FEATURES_KEY] = ref_data[AtomicDataDict.EDGE_FEATURES_KEY] + mu * ref_data[AtomicDataDict.EDGE_OVERLAP_KEY]
+            elif batch.max() >= 1:
+                slices = [data["__slices__"]["pos"][i]-data["__slices__"]["pos"][i-1] for i in range(1,len(data["__slices__"]["pos"]))]
+                slices = [0] + slices
+                ndiag_batch = torch.stack([i.sum() for i in self.idp.mask_to_ndiag[data[AtomicDataDict.ATOM_TYPE_KEY].flatten()].split(slices)])
+                ndiag_batch = torch.cumsum(ndiag_batch, dim=0)
+                mu = torch.stack([mu[ndiag_batch[i]:ndiag_batch[i+1]].mean() for i in range(len(ndiag_batch)-1)])
+                mu = mu.detach()
+                ref_data[AtomicDataDict.NODE_FEATURES_KEY] = ref_data[AtomicDataDict.NODE_FEATURES_KEY] + mu[batch, None] * ref_data[AtomicDataDict.NODE_OVERLAP_KEY]
+                edge_mu_index = torch.zeros(data[AtomicDataDict.EDGE_INDEX_KEY].shape[1], dtype=torch.long, device=self.device)
+                for i in range(1, batch.max().item()+1):
+                    edge_mu_index[data["__slices__"]["edge_index"][i]:data["__slices__"]["edge_index"][i+1]] += i
+                ref_data[AtomicDataDict.EDGE_FEATURES_KEY] = ref_data[AtomicDataDict.EDGE_FEATURES_KEY] + mu[edge_mu_index, None] * ref_data[AtomicDataDict.EDGE_OVERLAP_KEY]
+                
         pre = data[AtomicDataDict.NODE_FEATURES_KEY][self.idp.mask_to_nrme[data[AtomicDataDict.ATOM_TYPE_KEY].flatten()]]
         tgt = ref_data[AtomicDataDict.NODE_FEATURES_KEY][self.idp.mask_to_nrme[ref_data[AtomicDataDict.ATOM_TYPE_KEY].flatten()]]
         onsite_loss = 0.5*(self.loss1(pre, tgt) + torch.sqrt(self.loss2(pre, tgt)))
@@ -393,14 +407,27 @@ def forward(self, data: AtomicDataDict, ref_data: AtomicDataDict):
         # data[AtomicDataDict.EDGE_FEATURES_KEY].masked_fill(~self.idp.mask_to_erme[data[AtomicDataDict.EDGE_TYPE_KEY]], 0.)
 
         if self.onsite_shift:
-            data[AtomicDataDict.NODE_FEATURES_KEY][self.idp.mask_to_ndiag[data[AtomicDataDict.ATOM_TYPE_KEY].flatten()]] = \
-                data[AtomicDataDict.NODE_FEATURES_KEY][self.idp.mask_to_ndiag[data[AtomicDataDict.ATOM_TYPE_KEY].flatten()]] - \
-                data[AtomicDataDict.NODE_FEATURES_KEY][self.idp.mask_to_ndiag[data[AtomicDataDict.ATOM_TYPE_KEY].flatten()]].min()
-            
-            ref_data[AtomicDataDict.NODE_FEATURES_KEY][self.idp.mask_to_ndiag[ref_data[AtomicDataDict.ATOM_TYPE_KEY].flatten()]] = \
-                ref_data[AtomicDataDict.NODE_FEATURES_KEY][self.idp.mask_to_ndiag[ref_data[AtomicDataDict.ATOM_TYPE_KEY].flatten()]] - \
-                ref_data[AtomicDataDict.NODE_FEATURES_KEY][self.idp.mask_to_ndiag[ref_data[AtomicDataDict.ATOM_TYPE_KEY].flatten()]].min()
-        
+            batch = data.get("batch", torch.zeros(data[AtomicDataDict.POSITIONS_KEY].shape[0]))
+            # assert batch.max() == 0, "The onsite shift is only supported for batchsize=1."
+            mu = data[AtomicDataDict.NODE_FEATURES_KEY][self.idp.mask_to_ndiag[data[AtomicDataDict.ATOM_TYPE_KEY].flatten()]] - \
+                  ref_data[AtomicDataDict.NODE_FEATURES_KEY][self.idp.mask_to_ndiag[ref_data[AtomicDataDict.ATOM_TYPE_KEY].flatten()]]
+            if batch.max() == 0: # when batchsize is zero
+                mu = mu.mean().detach()
+                ref_data[AtomicDataDict.NODE_FEATURES_KEY] = ref_data[AtomicDataDict.NODE_FEATURES_KEY] + mu * ref_data[AtomicDataDict.NODE_OVERLAP_KEY]
+                ref_data[AtomicDataDict.EDGE_FEATURES_KEY] = ref_data[AtomicDataDict.EDGE_FEATURES_KEY] + mu * ref_data[AtomicDataDict.EDGE_OVERLAP_KEY]
+            elif batch.max() >= 1:
+                slices = [data["__slices__"]["pos"][i]-data["__slices__"]["pos"][i-1] for i in range(1,len(data["__slices__"]["pos"]))]
+                slices = [0] + slices
+                ndiag_batch = torch.stack([i.sum() for i in self.idp.mask_to_ndiag[data[AtomicDataDict.ATOM_TYPE_KEY].flatten()].split(slices)])
+                ndiag_batch = torch.cumsum(ndiag_batch, dim=0)
+                mu = torch.stack([mu[ndiag_batch[i]:ndiag_batch[i+1]].mean() for i in range(len(ndiag_batch)-1)])
+                mu = mu.detach()
+                ref_data[AtomicDataDict.NODE_FEATURES_KEY] = ref_data[AtomicDataDict.NODE_FEATURES_KEY] + mu[batch, None] * ref_data[AtomicDataDict.NODE_OVERLAP_KEY]
+                edge_mu_index = torch.zeros(data[AtomicDataDict.EDGE_INDEX_KEY].shape[1], dtype=torch.long, device=self.device)
+                for i in range(1, batch.max().item()+1):
+                    edge_mu_index[data["__slices__"]["edge_index"][i]:data["__slices__"]["edge_index"][i+1]] += i
+                ref_data[AtomicDataDict.EDGE_FEATURES_KEY] = ref_data[AtomicDataDict.EDGE_FEATURES_KEY] + mu[edge_mu_index, None] * ref_data[AtomicDataDict.EDGE_OVERLAP_KEY]
+                
         onsite_loss = data[AtomicDataDict.NODE_FEATURES_KEY]-ref_data[AtomicDataDict.NODE_FEATURES_KEY]
         onsite_index = data[AtomicDataDict.ATOM_TYPE_KEY].flatten().unique()
         onsite_loss = scatter_mean(
@@ -459,6 +486,7 @@ def __init__(
             overlap: bool=False,
             dtype: Union[str, torch.dtype] = torch.float32,
             decompose: bool = False,
+            onsite_shift: bool=False,
             device: Union[str, torch.device] = torch.device("cpu"),
             **kwargs,
         ):
@@ -469,6 +497,7 @@ def __init__(
         self.decompose = decompose
         self.dtype = dtype
         self.device = device
+        self.onsite_shift = onsite_shift
 
         if basis is not None:
             self.idp = OrbitalMapper(basis, method="e3tb", device=self.device)
@@ -485,6 +514,29 @@ def __init__(
             self.e3s = E3Hamiltonian(idp=self.idp, decompose=decompose, overlap=True, device=device, dtype=dtype)
     
     def __call__(self, data: AtomicDataDict, ref_data: AtomicDataDict, running_avg: bool=False):
+
+        if self.onsite_shift:
+            batch = data.get("batch", torch.zeros(data[AtomicDataDict.POSITIONS_KEY].shape[0]))
+            # assert batch.max() == 0, "The onsite shift is only supported for batchsize=1."
+            mu = data[AtomicDataDict.NODE_FEATURES_KEY][self.idp.mask_to_ndiag[data[AtomicDataDict.ATOM_TYPE_KEY].flatten()]] - \
+                  ref_data[AtomicDataDict.NODE_FEATURES_KEY][self.idp.mask_to_ndiag[ref_data[AtomicDataDict.ATOM_TYPE_KEY].flatten()]]
+            if batch.max() == 0: # when batchsize is zero
+                mu = mu.mean().detach()
+                ref_data[AtomicDataDict.NODE_FEATURES_KEY] = ref_data[AtomicDataDict.NODE_FEATURES_KEY] + mu * ref_data[AtomicDataDict.NODE_OVERLAP_KEY]
+                ref_data[AtomicDataDict.EDGE_FEATURES_KEY] = ref_data[AtomicDataDict.EDGE_FEATURES_KEY] + mu * ref_data[AtomicDataDict.EDGE_OVERLAP_KEY]
+            elif batch.max() >= 1:
+                slices = [data["__slices__"]["pos"][i]-data["__slices__"]["pos"][i-1] for i in range(1,len(data["__slices__"]["pos"]))]
+                slices = [0] + slices
+                ndiag_batch = torch.stack([i.sum() for i in self.idp.mask_to_ndiag[data[AtomicDataDict.ATOM_TYPE_KEY].flatten()].split(slices)])
+                ndiag_batch = torch.cumsum(ndiag_batch, dim=0)
+                mu = torch.stack([mu[ndiag_batch[i]:ndiag_batch[i+1]].mean() for i in range(len(ndiag_batch)-1)])
+                mu = mu.detach()
+                ref_data[AtomicDataDict.NODE_FEATURES_KEY] = ref_data[AtomicDataDict.NODE_FEATURES_KEY] + mu[batch, None] * ref_data[AtomicDataDict.NODE_OVERLAP_KEY]
+                edge_mu_index = torch.zeros(data[AtomicDataDict.EDGE_INDEX_KEY].shape[1], dtype=torch.long, device=self.device)
+                for i in range(1, batch.max().item()+1):
+                    edge_mu_index[data["__slices__"]["edge_index"][i]:data["__slices__"]["edge_index"][i+1]] += i
+                ref_data[AtomicDataDict.EDGE_FEATURES_KEY] = ref_data[AtomicDataDict.EDGE_FEATURES_KEY] + mu[edge_mu_index, None] * ref_data[AtomicDataDict.EDGE_OVERLAP_KEY]
+                
         if self.decompose:
             data = self.e3h(data)
             ref_data = self.e3h(ref_data)
@@ -664,11 +716,16 @@ def __call__(self, data: AtomicDataDict, ref_data: AtomicDataDict, running_avg:
             
         return self.stats
     
-    def visualize(self):
+    def report(self):
         assert hasattr(self, "stats"), "The stats is not computed yet."
 
+        print(f"TOTAL:")
+        print(f"MAE: {self.stats['mae']}")
+        print(f"RMSE: {self.stats['rmse']}")
+        print(f"\n")
+        
         with torch.no_grad():
-            print("Onsite:")
+            print(f"Onsite: ")
             for at, tp in self.idp.chemical_symbol_to_type.items():
                 print(f"{at}:")
                 print(f"MAE: {self.stats['onsite'][at]['mae']}")
@@ -677,6 +734,11 @@ def visualize(self):
                 # compute the onsite per block err
                 onsite_mae = torch.zeros((self.idp.full_basis_norb, self.idp.full_basis_norb,), dtype=self.dtype, device=self.device)
                 onsite_rmse = torch.zeros((self.idp.full_basis_norb, self.idp.full_basis_norb,), dtype=self.dtype, device=self.device)
+                mae_per_block_element = torch.zeros((self.idp.reduced_matrix_element,), dtype=self.dtype, device=self.device)
+                mae_per_block_element[self.idp.mask_to_nrme[tp]] = self.stats["onsite"][at]["mae_per_block_element"]
+                rmse_per_block_element = torch.zeros((self.idp.reduced_matrix_element,), dtype=self.dtype, device=self.device)              
+                rmse_per_block_element[self.idp.mask_to_nrme[tp]] = self.stats["onsite"][at]["rmse_per_block_element"]
+                
                 ist = 0
                 for i,iorb in enumerate(self.idp.full_basis):
                     jst = 0
@@ -693,8 +755,8 @@ def visualize(self):
 
                         # constructing onsite blocks
                         if i <= j:
-                            onsite_mae[ist:ist+2*li+1,jst:jst+2*lj+1] = factor * self.stats["onsite"][at]["mae_per_block_element"][self.idp.orbpair_maps[orbpair]].reshape(2*li+1, 2*lj+1)
-                            onsite_rmse[ist:ist+2*li+1,jst:jst+2*lj+1] = factor * self.stats["onsite"][at]["rmse_per_block_element"][self.idp.orbpair_maps[orbpair]].reshape(2*li+1, 2*lj+1)
+                            onsite_mae[ist:ist+2*li+1,jst:jst+2*lj+1] = factor * mae_per_block_element[self.idp.orbpair_maps[orbpair]].reshape(2*li+1, 2*lj+1)
+                            onsite_rmse[ist:ist+2*li+1,jst:jst+2*lj+1] = factor * rmse_per_block_element[self.idp.orbpair_maps[orbpair]].reshape(2*li+1, 2*lj+1)
 
                         jst += 2*lj+1
                     ist += 2*li+1
@@ -706,24 +768,30 @@ def visualize(self):
                 onsite_mae = onsite_mae[imask][:,imask]
                 onsite_rmse = onsite_rmse[imask][:,imask]
 
-                plt.matshow(onsite_mae.detach().cpu().numpy(), cmap="Blues", vmin=0, vmax=1e-3)
+                vmax = onsite_mae.max().item()
+                plt.matshow(onsite_mae.detach().cpu().numpy(), cmap="Blues", vmin=0, vmax=vmax)
                 plt.title("MAE")
                 plt.colorbar()
                 plt.show()
 
-                plt.matshow(onsite_rmse.detach().cpu().numpy(), cmap="Blues", vmin=0, vmax=1e-3)
+                vmax = onsite_rmse.max().item()
+                plt.matshow(onsite_rmse.detach().cpu().numpy(), cmap="Blues", vmin=0, vmax=vmax)
                 plt.title("RMSE")
                 plt.colorbar()
                 plt.show()
 
             # compute the hopping per block err
-            print("Hopping:")
+            print(f"Hopping: ")
             for bt, tp in self.idp.bond_to_type.items():
                 print(f"{bt}:")
                 print(f"MAE: {self.stats['hopping'][bt]['mae']}")
                 print(f"RMSE: {self.stats['hopping'][bt]['rmse']}")
                 hopping_mae = torch.zeros((self.idp.full_basis_norb, self.idp.full_basis_norb,), dtype=self.dtype, device=self.device)
                 hopping_rmse = torch.zeros((self.idp.full_basis_norb, self.idp.full_basis_norb,), dtype=self.dtype, device=self.device)
+                mae_per_block_element = torch.zeros((self.idp.reduced_matrix_element,), dtype=self.dtype, device=self.device)
+                mae_per_block_element[self.idp.mask_to_erme[tp]] = self.stats["hopping"][bt]["mae_per_block_element"]
+                rmse_per_block_element = torch.zeros((self.idp.reduced_matrix_element,), dtype=self.dtype, device=self.device)              
+                rmse_per_block_element[self.idp.mask_to_erme[tp]] = self.stats["hopping"][bt]["rmse_per_block_element"]
                 ist = 0
                 for i,iorb in enumerate(self.idp.full_basis):
                     jst = 0
@@ -740,26 +808,29 @@ def visualize(self):
 
                         # constructing onsite blocks
                         if i <= j:
-                            hopping_mae[ist:ist+2*li+1,jst:jst+2*lj+1] = factor * self.stats["hopping"][bt]["mae_per_block_element"][self.idp.orbpair_maps[orbpair]].reshape(2*li+1, 2*lj+1)
-                            hopping_rmse[ist:ist+2*li+1,jst:jst+2*lj+1] = factor * self.stats["hopping"][bt]["rmse_per_block_element"][self.idp.orbpair_maps[orbpair]].reshape(2*li+1, 2*lj+1)
+                            hopping_mae[ist:ist+2*li+1,jst:jst+2*lj+1] = factor * mae_per_block_element[self.idp.orbpair_maps[orbpair]].reshape(2*li+1, 2*lj+1)
+                            hopping_rmse[ist:ist+2*li+1,jst:jst+2*lj+1] = factor * rmse_per_block_element[self.idp.orbpair_maps[orbpair]].reshape(2*li+1, 2*lj+1)
 
                         jst += 2*lj+1
                     ist += 2*li+1
 
                 hopping_mae += hopping_mae.clone().T
                 hopping_rmse += hopping_rmse.clone().T
-
-                imask = self.idp.mask_to_basis[tp]
-                jmask = self.idp.mask_to_basis[tp]
+                
+                iat, jat = bt.split("-")
+                imask = self.idp.mask_to_basis[self.idp.chemical_symbol_to_type[iat]]
+                jmask = self.idp.mask_to_basis[self.idp.chemical_symbol_to_type[jat]]
                 hopping_mae = hopping_mae[imask][:,jmask]
                 hopping_rmse = hopping_rmse[imask][:,jmask]
 
-                plt.matshow(hopping_mae.detach().cpu().numpy(), cmap="Blues", vmin=0, vmax=1e-3)
+                vmax = hopping_mae.max().item()
+                plt.matshow(hopping_mae.detach().cpu().numpy(), cmap="Blues", vmin=0, vmax=vmax)
                 plt.title("MAE")
                 plt.colorbar()
                 plt.show()
 
-                plt.matshow(hopping_mae.detach().cpu().numpy(), cmap="Blues", vmin=0, vmax=1e-3)
+                vmax = hopping_mae.max().item()
+                plt.matshow(hopping_rmse.detach().cpu().numpy(), cmap="Blues", vmin=0, vmax=vmax)
                 plt.title("RMSE")
                 plt.colorbar()
                 plt.show()
diff --git a/dptb/postprocess/__init__.py b/dptb/postprocess/__init__.py
index a5352b93..be849ddf 100644
--- a/dptb/postprocess/__init__.py
+++ b/dptb/postprocess/__init__.py
@@ -1,11 +1,11 @@
 from .bandstructure import Band
 from .totbplas import TBPLaS
-from .write_ham import write_ham
+from .write_block import write_block
 
 
 __all__ = [
     Band,
     TBPLaS,
-    write_ham,
+    write_block,
 
 ]
\ No newline at end of file
diff --git a/dptb/postprocess/bandstructure/band.py b/dptb/postprocess/bandstructure/band.py
index 857ce904..b8758dc6 100644
--- a/dptb/postprocess/bandstructure/band.py
+++ b/dptb/postprocess/bandstructure/band.py
@@ -276,9 +276,14 @@ def band_plot(
 
         matplotlib.rcParams['font.size'] = 7
         matplotlib.rcParams['pdf.fonttype'] = 42
+
+        matplotlib.rcParams['font.sans-serif'] = ["Times New Roman"]
+        matplotlib.rcParams['axes.linewidth'] = 0.5
+        matplotlib.rcParams['xtick.major.width'] =0.3
+        matplotlib.rcParams['ytick.major.width'] =0.3
         # plt.rcParams['font.sans-serif'] = ['Times New Roman']
 
-        fig = plt.figure(figsize=(4.5,4),dpi=100)
+        fig = plt.figure(figsize=(3.2,2.8),dpi=100)
 
         ax = fig.add_subplot(111)
 
@@ -294,25 +299,23 @@ def band_plot(
                 raise ValueError
 
             if ref_band.shape[0] != self.eigenstatus["eigenvalues"].shape[0]:
-                print('ref_band.shape[0]',ref_band.shape[0])
-                print('self.eigenstatus["eigenvalues"].shape[0]',self.eigenstatus["eigenvalues"].shape[0])
                 log.error("Reference Eigenvalues' should have sampled from the sample kpath as model's prediction.")
                 raise ValueError
             ref_band = ref_band - (np.min(ref_band) - np.min(self.eigenstatus["eigenvalues"]))
 
-            nkplot = (len(np.unique(self.eigenstatus["high_sym_kpoints"]))-1) * 7
-            nintp = len(self.eigenstatus["xlist"]) // nkplot 
-            if nintp == 0:
-                nintp = 1
-            band_ref = ax.plot(self.eigenstatus["xlist"][::nintp], ref_band[::nintp] - E_fermi, 'o', ms=4, color=band_color, alpha=0.8, label="Ref")
-            band_pre = ax.plot(self.eigenstatus["xlist"], self.eigenstatus["eigenvalues"] - E_fermi, color="tab:red", lw=1.5, alpha=0.8, label="DeePTB")
+            # nkplot = (len(np.unique(self.eigenstatus["high_sym_kpoints"]))-1) * 5
+            # nintp = len(self.eigenstatus["xlist"]) // nkplot 
+            # if nintp == 0:
+            nintp = self.eigenstatus["xlist"].shape[0] // 25
+            band_ref = ax.plot(self.eigenstatus["xlist"][::nintp], ref_band[::nintp] - E_fermi, 'o', ms=2, color=band_color, alpha=0.95, label="Ref")
+            band_pre = ax.plot(self.eigenstatus["xlist"], self.eigenstatus["eigenvalues"] - E_fermi, color="tab:red", lw=0.5, alpha=0.95, label="DeePTB")
 
         else:
-            ax.plot(self.eigenstatus["xlist"], self.eigenstatus["eigenvalues"] - E_fermi, color="tab:red",lw=1.5, alpha=0.8)
+            ax.plot(self.eigenstatus["xlist"], self.eigenstatus["eigenvalues"] - E_fermi, color="tab:red",lw=0.5, alpha=0.95)
 
         # add verticle line
         for ii in self.eigenstatus["high_sym_kpoints"][1:-1]:
-            ax.axvline(ii, color='gray', lw=1,ls='--')
+            ax.axvline(ii, color='gray', lw=0.3,ls='--')
 
         # add shadow
         # for i in range(self.eigenvalues.shape[1]):
@@ -322,32 +325,33 @@ def band_plot(
         if not (emin is None or emax is None):
             ax.set_ylim(emin,emax)
 
-        ax.set_xlim(self.eigenstatus["xlist"].min()-0.03,self.eigenstatus["xlist"].max()+0.03)
-        ax.set_ylabel('E - EF (eV)',fontsize=12)
+        # amp = self.eigenstatus["xlist"].max()
+        ax.set_xlim(self.eigenstatus["xlist"].min(),self.eigenstatus["xlist"].max())
+        ax.set_ylabel('E - EF (eV)',fontsize=8)
         ax.yaxis.set_minor_locator(MultipleLocator(1.0))
-        ax.tick_params(which='both', direction='in', labelsize=12, width=1.5)
+        ax.tick_params(which='both', direction='in', labelsize=8)
         ax.tick_params(which='major', length=6)
-        ax.tick_params(which='minor', length=4, color='gray')
+        ax.tick_params(which='minor', length=4)
         
         # ax.set_yticks(None, fontsize=12)
-        ax.set_xticks(self.eigenstatus["high_sym_kpoints"], self.eigenstatus["labels"], fontsize=12)
+        ax.set_xticks(self.eigenstatus["high_sym_kpoints"], self.eigenstatus["labels"], fontsize=8)
 
-        ax.grid(color='gray', alpha=0.2, linestyle='-', linewidth=1)
+        # ax.grid(color='gray', alpha=0.2, linestyle='-', linewidth=1)
         ax.set_axisbelow(True)
 
-        fig.patch.set_facecolor('#f2f2f2')
-        fig.patch.set_alpha(1)
-        for spine in ax.spines.values():
-            spine.set_edgecolor('#5d5d5d')
-            spine.set_linewidth(1.5)
+        # fig.patch.set_facecolor('#f2f2f2')
+        # fig.patch.set_alpha(1)
+        # for spine in ax.spines.values():
+        #     spine.set_edgecolor('#5d5d5d')
+        #     spine.set_linewidth(1.5)
         
         if ref_band is not None:
             plt.legend(handles=[band_pre[0], band_ref[0]], loc="best")
-        
         plt.tight_layout()
         # remove the box around the plot
-        ax.set_frame_on(False)
+        # ax.set_frame_on(False) # setting of whether to show the frame line
         if self.results_path is not None:
             plt.savefig(f'{self.results_path}/band.png',dpi=300)
         if self.use_gui:
             plt.show()
+
diff --git a/dptb/postprocess/write_ham.py b/dptb/postprocess/write_block.py
similarity index 85%
rename from dptb/postprocess/write_ham.py
rename to dptb/postprocess/write_block.py
index e250968d..812b26b1 100644
--- a/dptb/postprocess/write_ham.py
+++ b/dptb/postprocess/write_block.py
@@ -13,7 +13,7 @@
 
 log = logging.getLogger(__name__)
 
-def write_ham(
+def write_block(
         data: Union[AtomicData, ase.Atoms, str], 
         model: torch.nn.Module,
         AtomicData_options: dict={},
@@ -35,11 +35,12 @@ def write_ham(
         data = data
     
     data = AtomicData.to_AtomicDataDict(data.to(device))
-    data = model.idp(data)
+    with torch.no_grad():
+        data = model.idp(data)
 
-    # set the kpoint of the AtomicData
-    data = model(data)
-    block = feature_to_block(data=data, idp=model.idp)
+        # set the kpoint of the AtomicData
+        data = model(data)
+        block = feature_to_block(data=data, idp=model.idp)
 
     return block
 
diff --git a/dptb/tests/test_build_model.py b/dptb/tests/test_build_model.py
index e0ca5e33..0a09909d 100644
--- a/dptb/tests/test_build_model.py
+++ b/dptb/tests/test_build_model.py
@@ -46,8 +46,7 @@ def test_build_nnsk_from_scratch():
         "overlap": False,
         "seed": 3982377700
     }
-    statistics = None
-    model = build_model(None, model_options, common_options, statistics)
+    model = build_model(None, model_options, common_options)
 
     assert isinstance(model, NNSK)
     assert model.device == "cpu"
@@ -111,9 +110,8 @@ def test_build_model_MIX_from_scratch():
         "overlap": False,
         "seed": 3982377700
     }
-    statistics = None
 
-    model = build_model(None, model_options, common_options, statistics)
+    model = build_model(None, model_options, common_options)
 
     assert isinstance(model, MIX)
     assert model.name == "mix"
diff --git a/dptb/tests/test_trainer.py b/dptb/tests/test_trainer.py
index 2628e0b8..a9b284c0 100644
--- a/dptb/tests/test_trainer.py
+++ b/dptb/tests/test_trainer.py
@@ -47,7 +47,7 @@ def test_fromscratch_noref_noval(self):
         jdata = self.jdata
         train_datasets = self.train_datasets
         model = build_model(None, model_options=jdata["model_options"], 
-                        common_options=jdata["common_options"], statistics=train_datasets.E3statistics())
+                        common_options=jdata["common_options"])
         trainer = Trainer(
             train_options=jdata["train_options"],
             common_options=jdata["common_options"],
@@ -73,7 +73,7 @@ def test_fromscratch_ref_noval(self):
         reference_datasets = build_dataset(**jdata["data_options"]["reference"], **jdata["common_options"])
 
         model = build_model(None, model_options=jdata["model_options"], 
-                        common_options=jdata["common_options"], statistics=train_datasets.E3statistics())
+                        common_options=jdata["common_options"])
         
         trainer = Trainer(
             train_options=jdata["train_options"],
@@ -100,7 +100,7 @@ def test_fromscratch_noref_val(self):
         validation_datasets = build_dataset(**jdata["data_options"]["validation"], **jdata["common_options"])
 
         model = build_model(None, model_options=jdata["model_options"], 
-                        common_options=jdata["common_options"], statistics=train_datasets.E3statistics())
+                        common_options=jdata["common_options"])
         
         trainer = Trainer(
             train_options=jdata["train_options"],
@@ -126,7 +126,7 @@ def test_initmodel_noref_nval(self):
         checkpoint = f"{rootdir}/test_sktb/output/test_valence/checkpoint/nnsk.best.pth"
         run_options.update({"init_model": checkpoint, "restart": None})
         model = build_model(checkpoint, model_options=jdata["model_options"], 
-                            common_options=jdata["common_options"], statistics=train_datasets.E3statistics())
+                            common_options=jdata["common_options"])
         trainer = Trainer(
             train_options=jdata["train_options"],
             common_options=jdata["common_options"],
diff --git a/dptb/utils/argcheck.py b/dptb/utils/argcheck.py
index ceee4d2a..4e138f56 100644
--- a/dptb/utils/argcheck.py
+++ b/dptb/utils/argcheck.py
@@ -519,14 +519,15 @@ def slem():
             Argument("avg_num_neighbors", [int, float], optional=False, doc=doc_avg_num_neighbors),
             Argument("r_max", [float, int, dict], optional=False, doc=doc_r_max),
             Argument("n_layers", int, optional=False, doc=doc_n_layers),
+            
             Argument("n_radial_basis", int, optional=True, default=10, doc=doc_n_radial_basis),
             Argument("PolynomialCutoff_p", int, optional=True, default=6, doc="The order of polynomial cutoff function. Default: 6"),
             Argument("cutoff_type", str, optional=True, default="polynomial", doc="The type of cutoff function. Default: polynomial"),
-            Argument("env_embed_multiplicity", int, optional=True, default=1, doc=doc_env_embed_multiplicity),
+            Argument("env_embed_multiplicity", int, optional=True, default=10, doc=doc_env_embed_multiplicity),
             Argument("tp_radial_emb", bool, optional=True, default=False, doc="Whether to use tensor product radial embedding."),
-            Argument("tp_radial_channels", list, optional=True, default=[128, 128], doc="The number of channels in tensor product radial embedding."),
-            Argument("latent_channels", list, optional=True, default=[128, 128], doc="The number of channels in latent embedding."),
-            Argument("latent_dim", int, optional=True, default=256, doc="The dimension of latent embedding."),
+            Argument("tp_radial_channels", list, optional=True, default=[32], doc="The number of channels in tensor product radial embedding."),
+            Argument("latent_channels", list, optional=True, default=[32], doc="The number of channels in latent embedding."),
+            Argument("latent_dim", int, optional=True, default=64, doc="The dimension of latent embedding."),
             Argument("res_update", bool, optional=True, default=True, doc="Whether to use residual update."),
             Argument("res_update_ratios", float, optional=True, default=0.5, doc="The ratios of residual update, should in (0,1)."),
             Argument("res_update_ratios_learnable", bool, optional=True, default=False, doc="Whether to make the ratios of residual update learnable."),
@@ -752,7 +753,7 @@ def loss_options():
     ]
 
     loss_args = Variant("method", [
-        Argument("hamil", dict, sub_fields=hamil),
+        # Argument("hamil", dict, sub_fields=hamil),
         Argument("eigvals", dict, sub_fields=eigvals),
         Argument("skints", dict, sub_fields=skints),
         Argument("hamil_abs", dict, sub_fields=hamil),
diff --git a/examples/e3/band.json b/examples/e3/band.json
new file mode 100644
index 00000000..a46799e1
--- /dev/null
+++ b/examples/e3/band.json
@@ -0,0 +1,23 @@
+{   
+    "task_options": {
+        "task": "band",
+        "kline_type":"abacus",
+        "kpath":[[0.0000000000,   0.0000000000,   0.0000000000,   20],   
+                [0.5000000000,   0.0000000000,   0.0000000000,   1],               
+                [0.0000000000,   0.5000000000,   0.0000000000,   20],    
+                [0.0000000000,   0.0000000000,   0.0000000000,   20],     
+                [0.0000000000,   0.0000000000,   0.5000000000,   1],    
+                [-0.5000000000,  -0.5000000000,   0.5000000000,   20],                
+                [0.0000000000,   0.0000000000,   0.0000000000,   20],               
+                [0.0000000000,   -0.5000000000,   0.5000000000,   1 ],
+                [-0.5000000000,   0.0000000000,   0.5000000000, 20],
+                [0.0000000000, 0.0000000000, 0.0000000000, 20],
+                [0.5000000000, -0.500000000, 0.0000000000, 1]
+                ],
+        "klabels":["G","X","Y","G","Z","R_2","G","T_2","U_2","G","V_2"],
+        "nel_atom":{"Si":4},
+        "E_fermi":0.0,
+        "emin":-7,
+        "emax":18
+    }
+}
\ No newline at end of file
diff --git a/examples/e3/band_plot.py b/examples/e3/band_plot.py
new file mode 100644
index 00000000..17244a9b
--- /dev/null
+++ b/examples/e3/band_plot.py
@@ -0,0 +1,46 @@
+from dptb.postprocess.bandstructure.band import Band
+from dptb.utils.tools import j_loader
+from dptb.nn.build import build_model
+from dptb.data import build_dataset
+
+model = build_model(checkpoint="./e3_silicon/checkpoint/nnenv.best.pth") # buld the trained e3 hamiltonian and overlap model
+
+dataset = build_dataset(
+    root="./data",
+    prefix="Si64"
+) # build the dataset
+
+jdata = j_loader("./band.json")
+kpath_kwargs = jdata["task_options"] # read the kpath information from the json file
+# "task": "band",
+# "kline_type":"abacus",
+# "kpath":[[0.0000000000,   0.0000000000,   0.0000000000,   20],   
+#         [0.5000000000,   0.0000000000,   0.0000000000,   1],               
+#         [0.0000000000,   0.5000000000,   0.0000000000,   20],    
+#         [0.0000000000,   0.0000000000,   0.0000000000,   20],     
+#         [0.0000000000,   0.0000000000,   0.5000000000,   1],    
+#         [-0.5000000000,  -0.5000000000,   0.5000000000,   20],                
+#         [0.0000000000,   0.0000000000,   0.0000000000,   20],               
+#         [0.0000000000,   -0.5000000000,   0.5000000000,   1 ],
+#         [-0.5000000000,   0.0000000000,   0.5000000000, 20],
+#         [0.0000000000, 0.0000000000, 0.0000000000, 20],
+#         [0.5000000000, -0.500000000, 0.0000000000, 1]
+#         ],
+# "klabels":["G","X","Y","G","Z","R_2","G","T_2","U_2","G","V_2"],
+# "nel_atom":{"Si":4},
+# "E_fermi":0.0,
+# "emin":-7,
+# "emax":18
+
+bcal = Band(model=model, 
+            use_gui=False,
+            results_path="./", 
+            device=model.device)
+bcal.get_bands(data=dataset[0], 
+               kpath_kwargs=kpath_kwargs) # compute band structure
+
+bcal.band_plot(
+        E_fermi = kpath_kwargs["E_fermi"],
+        emin = kpath_kwargs["emin"],
+        emax = kpath_kwargs["emax"]
+               ) # plot the band structure
diff --git a/examples/e3/data/Si64.0/hamiltonians.h5 b/examples/e3/data/Si64.0/hamiltonians.h5
new file mode 100644
index 00000000..ca8e7b91
Binary files /dev/null and b/examples/e3/data/Si64.0/hamiltonians.h5 differ
diff --git a/examples/e3/data/Si64.0/kpoints.npy b/examples/e3/data/Si64.0/kpoints.npy
new file mode 100644
index 00000000..47a7199a
Binary files /dev/null and b/examples/e3/data/Si64.0/kpoints.npy differ
diff --git a/examples/e3/data/Si64.0/overlaps.h5 b/examples/e3/data/Si64.0/overlaps.h5
new file mode 100644
index 00000000..5aa963a7
Binary files /dev/null and b/examples/e3/data/Si64.0/overlaps.h5 differ
diff --git a/examples/e3/data/info.json b/examples/e3/data/info.json
new file mode 100644
index 00000000..87f6df55
--- /dev/null
+++ b/examples/e3/data/info.json
@@ -0,0 +1,8 @@
+{
+	"nframes": 1,
+	"pos_type": "cart",
+	"AtomicData_options": {
+		"r_max": 7.4,
+		"pbc": true
+	}
+}
diff --git a/examples/e3/input.json b/examples/e3/input.json
deleted file mode 100644
index 1c6cacfa..00000000
--- a/examples/e3/input.json
+++ /dev/null
@@ -1,67 +0,0 @@
-{
-    "common_options": {
-        "bond_cutoff": 5.0,
-        "env_cutoff": 5.0,
-        "seed": 12342,
-        "basis": {
-            "Al": ["3s", "3p", "d*"],
-            "As": ["4s", "4p", "d*"]
-        },
-        "device": "cuda",
-        "dtype": "float32",
-        "overlap": false
-    },
-    "model_options": {
-        "embedding":{
-            "method": "se2",
-            "rc": 5.0,
-            "rs": 2.0,
-            "n_axis": 10,
-            "radial_net": {
-                "neurons": [20,40,60],
-                "activation": "tanh",
-                "if_batch_normalized": false
-            }
-        },
-        "prediction":{
-            "method": "sktb",
-            "neurons": [128,128,128],
-            "activation": "tanh",
-            "if_batch_normalized": false
-        }
-    },
-    "train_options": {
-        "num_epoch": 4000,
-        "batch_size": 1,
-        "optimizer": {
-            "lr": 0.01,
-            "type": "Adam"
-        },
-        "lr_scheduler": {
-            "type": "exp",
-            "gamma": 0.995
-        },
-        "loss_options":{
-            "train":{"method": "eigvals"}
-        },
-        "save_freq": 10,
-        "validation_freq": 10,
-        "display_freq": 1
-    },
-    "data_options": {
-        "train": {
-            "root": "/share/semicond/lmp_abacus/abacus_hse_data/AlAs/result-prod-alas/prod-alas/AlAs/sys-000/",
-            "preprocess_path": "/share/semicond/lmp_abacus/abacus_hse_data/AlAs/result-prod-alas/prod-alas/AlAs/sys-000/atomic_data/",
-            "file_names": [
-                "T100/frame-29/AtomicData.h5",
-                "T500/frame-100/AtomicData.h5",
-                "T500/frame-44/AtomicData.h5",
-                "T1000/frame-27/AtomicData.h5",
-                "T1000/frame-52/AtomicData.h5",
-                "T1500/frame-35/AtomicData.h5",
-                "T1500/frame-89/AtomicData.h5"
-            ],
-            "pbc": true
-        }
-    }
-}
\ No newline at end of file
diff --git a/examples/e3/input_short.json b/examples/e3/input_short.json
new file mode 100644
index 00000000..6cbc1f3f
--- /dev/null
+++ b/examples/e3/input_short.json
@@ -0,0 +1,54 @@
+{
+    "common_options": {
+        "basis": {
+            "Si": "1s1p"
+        },
+        "device": "cuda",
+        "overlap": true
+    },
+    "model_options": {
+        "embedding": {
+            "method": "slem",
+            "r_max": {
+                "Si": 7.4
+            },
+            "irreps_hidden": "32x0e+32x1o+16x2e",
+            "n_layers": 3,
+            "avg_num_neighbors": 51,
+            "tp_radial_emb": true
+        },
+        "prediction":{
+            "method": "e3tb",
+            "neurons": [64,64]
+        }
+    },
+    "train_options": {
+        "num_epoch": 1500,
+        "batch_size": 1,
+        "optimizer": {
+            "lr": 0.005,
+            "type": "Adam"
+        },
+        "lr_scheduler": {
+            "type": "rop",
+            "factor": 0.8,
+            "patience": 50,
+            "min_lr": 1e-6
+        },
+        "loss_options":{
+            "train":{"method": "hamil_abs"}
+        },
+        "save_freq": 100,
+        "validation_freq": 10,
+        "display_freq": 1
+    },
+    "data_options": {
+        "train": {
+            "root": "./data",
+            "prefix": "Si64",
+            "get_Hamiltonian": true,
+            "get_overlap": true
+        }
+    }
+}
+