refactor(data preprocess): remove the cut off options from info.json #200

QG-phy · 2024-08-01T03:26:35Z

refactor(data preprocess): remove the cut off options from info.json and collect the values from input.json. when run model no need to supply the atomicdata options.

Fix: #155

…and collect the values from input.json

Previous the ase data will be transferred into text file and then loaded by the _TrajData. now i refactor the function. both text and ase data are treated equally. will works as a class funtion to initial the _TrajData class.

…om_model_options .

…g96. For powerlaw and varTang96, the rs is not exactly the hard cutoff. so when extract the r_max for data. we have to use rs + 5 * w; but for other method just use rs.

…o be dict.

…s instance and add from_model class function. note, compared to the previous build_dataset, this one is more flexible. previous build_dataset is a function. now i define a class DataBuilder and re-defined __call__ function. then build_dataset is an instance of DataBuilder class. so i can use build_dataset.from_model() to build dataset from model. at the same time the previous way to use build_dataset is still available. like build_dataset(...).

floatingCatty · 2024-08-13T01:59:46Z

dptb/data/AtomicData.py

@@ -500,7 +500,7 @@ def from_points(
    def from_ase(
        cls,
        atoms,
-        r_max,
+        r_max: Union[float, int, dict],
        er_max: Optional[float] = None,


这边er_max和oer_max不要同步改下嘛？

dptb/data/build.py

floatingCatty · 2024-08-13T09:05:59Z

dptb/data/dataset/_default_dataset.py

+                # same cell size, then copy it to all frames.
+                cell = np.expand_dims(cell, axis=0)
+                data["cell"] = np.broadcast_to(cell, (info["nframes"], 3, 3))
+            elif cell.shape[0] == info["nframes"] * 3:


nframes现在是保留在info里的？

floatingCatty · 2024-08-13T09:07:01Z

dptb/data/dataset/_default_dataset.py

+        pos = np.loadtxt(os.path.join(root, "positions.dat"))
+        if len(pos.shape) == 1:
+            pos = pos.reshape(1,3)
+        natoms = info["natoms"]


ok看起来nframes和natoms的逻辑是没动的

nframes natoms 如果想去掉。就必须修改文本数据的存储格式。比如一帧结构存为一行这样。不然没办法从数据中提取这个信息。因此我没办法去掉。

floatingCatty · 2024-08-13T09:17:27Z

dptb/postprocess/bandstructure/band.py

@@ -170,7 +170,7 @@ def __init__(self, model:torch.nn.Module, results_path: str=None, use_gui: bool=
        self.results_path = results_path
        self.use_gui = use_gui

-    def get_bands(self, data: Union[AtomicData, ase.Atoms, str], kpath_kwargs: dict, AtomicData_options: dict={}):
+    def get_bands(self, data: Union[AtomicData, ase.Atoms, str], kpath_kwargs: dict, pbc:Union[bool,list]=None, AtomicData_options:dict=None):


这里为啥单加pbc，以及，前面数据部分AtomicData_options 被info取代掉了，这里为啥保留？

单加pbc 是因为这信息不一定能从给的结构文件中提取到。需要支持外部的指定。这里支持AtomicData_options 是为了兼容以前的一些存档。这个后续使用是可以不提供。但是对于一些旧存档，就必须加上，不然存档不能用。

而数据部分取消，是因为以后做训练任务，我们就不需要这个了。以后新训练下来的模型，后处理算能带的时候，也可以不提供这个。这个参数现在是 optional的。
这一切都是为了软件的兼容性，所不得不做的设置。

QG-phy added 7 commits August 1, 2024 11:25

refactor(data preprocess): remove the cut off options from info.json …

1861c4e

…and collect the values from input.json

update LMDB info.json. not need anymore.

3a1e1ef

add print logo in main and format some of the logger.info

3305d33

update argcheck collect_cutoffs. add new function with get_cutoffs_fr…

6563dda

…om_model_options .

Fix(get_cutoffs_from_model_options) : fix rcut in powerlaw and varTan…

41a67a6

…g96. For powerlaw and varTang96, the rs is not exactly the hard cutoff. so when extract the r_max for data. we have to use rs + 5 * w; but for other method just use rs.

update band post process.

9b12282

QG-phy requested review from floatingCatty and AsymmetryChou August 4, 2024 15:23

QG-phy added 2 commits August 5, 2024 15:02

update test

2171b3f

update test

1b66822

QG-phy marked this pull request as draft August 5, 2024 07:27

QG-phy added 3 commits August 5, 2024 16:02

update build and get_cutoffs_from_model_options to support the rmax t…

264c0a9

…o be dict.

add checkcutoff in dataset builder.

2470c9f

QG-phy marked this pull request as ready for review August 5, 2024 10:02

update AtomicData_options to make it compatible with older versions

67b4261

floatingCatty reviewed Aug 13, 2024

View reviewed changes

Update argcheck.py

8fa5b78

floatingCatty approved these changes Aug 13, 2024

View reviewed changes

floatingCatty merged commit caa903d into deepmodeling:main Aug 13, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(data preprocess): remove the cut off options from info.json #200

refactor(data preprocess): remove the cut off options from info.json #200

QG-phy commented Aug 1, 2024 •

edited

Loading

floatingCatty Aug 13, 2024

floatingCatty Aug 13, 2024

floatingCatty Aug 13, 2024

QG-phy Aug 13, 2024

floatingCatty Aug 13, 2024

QG-phy Aug 13, 2024

QG-phy Aug 13, 2024

refactor(data preprocess): remove the cut off options from info.json #200

refactor(data preprocess): remove the cut off options from info.json #200

Conversation

QG-phy commented Aug 1, 2024 • edited Loading

floatingCatty Aug 13, 2024

Choose a reason for hiding this comment

floatingCatty Aug 13, 2024

Choose a reason for hiding this comment

floatingCatty Aug 13, 2024

Choose a reason for hiding this comment

QG-phy Aug 13, 2024

Choose a reason for hiding this comment

floatingCatty Aug 13, 2024

Choose a reason for hiding this comment

QG-phy Aug 13, 2024

Choose a reason for hiding this comment

QG-phy Aug 13, 2024

Choose a reason for hiding this comment

QG-phy commented Aug 1, 2024 •

edited

Loading