Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

refactor(data preprocess): remove the cut off options from info.json #200

Merged
merged 14 commits into from
Aug 13, 2024

Conversation

QG-phy
Copy link
Collaborator

@QG-phy QG-phy commented Aug 1, 2024

refactor(data preprocess): remove the cut off options from info.json and collect the values from input.json. when run model no need to supply the atomicdata options.

Fix: #155

QG-phy added 7 commits August 1, 2024 11:25
Previous the ase data will be transferred into text file and then loaded by the _TrajData. now i refactor the function.

both text and ase data are treated equally. will works as a class funtion to initial the _TrajData class.
…g96.

For powerlaw and varTang96, the rs is not exactly the hard cutoff. so when extract the r_max for data. we have to use rs + 5 * w; but for other method just use rs.
@QG-phy QG-phy marked this pull request as draft August 5, 2024 07:27
QG-phy added 3 commits August 5, 2024 16:02
…s instance and add from_model class function.

note, compared to the previous build_dataset, this one is more flexible.
previous build_dataset is a function. now i define a class DataBuilder and re-defined __call__ function.  then build_dataset is an instance of DataBuilder class. so i can use build_dataset.from_model() to build dataset from model. at the same time the previous way to use  build_dataset is still available. like build_dataset(...).
@QG-phy QG-phy marked this pull request as ready for review August 5, 2024 10:02
@@ -500,7 +500,7 @@ def from_points(
def from_ase(
cls,
atoms,
r_max,
r_max: Union[float, int, dict],
er_max: Optional[float] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这边er_max和oer_max不要同步改下嘛?

dptb/data/build.py Show resolved Hide resolved
# same cell size, then copy it to all frames.
cell = np.expand_dims(cell, axis=0)
data["cell"] = np.broadcast_to(cell, (info["nframes"], 3, 3))
elif cell.shape[0] == info["nframes"] * 3:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nframes现在是保留在info里的?

pos = np.loadtxt(os.path.join(root, "positions.dat"))
if len(pos.shape) == 1:
pos = pos.reshape(1,3)
natoms = info["natoms"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok看起来nframes和natoms的逻辑是没动的

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nframes natoms 如果想去掉。就必须修改文本数据的存储格式。比如一帧结构存为一行这样。不然没办法从数据中提取这个信息。因此我没办法去掉。

@@ -170,7 +170,7 @@ def __init__(self, model:torch.nn.Module, results_path: str=None, use_gui: bool=
self.results_path = results_path
self.use_gui = use_gui

def get_bands(self, data: Union[AtomicData, ase.Atoms, str], kpath_kwargs: dict, AtomicData_options: dict={}):
def get_bands(self, data: Union[AtomicData, ase.Atoms, str], kpath_kwargs: dict, pbc:Union[bool,list]=None, AtomicData_options:dict=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里为啥单加pbc,以及,前面数据部分AtomicData_options 被info取代掉了,这里为啥保留?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

单加pbc 是因为这信息不一定能从给的结构文件中提取到。需要支持外部的指定。 这里支持AtomicData_options 是为了兼容以前的一些存档。这个后续使用是可以不提供。但是对于一些旧存档,就必须加上,不然存档不能用。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

而数据部分取消,是因为以后做训练任务,我们就不需要这个了。以后新训练下来的模型,后处理算能带的时候,也可以不提供这个。这个参数现在是 optional的。
这一切都是为了软件的兼容性,所不得不做的设置。

@floatingCatty floatingCatty merged commit caa903d into deepmodeling:main Aug 13, 2024
2 checks passed
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow run model without providing AtomicData_options
2 participants