Dpdata Toolkit#

ai2-kit tool dpdata

This toolkit is a command line wrapper of dpdata to allow user to process DeepMD dataset via command line.

Usage#

ai2-kit tool dpdata # show all commands
ai2-kit tool dpdata to_ase -h  # show doc of specific command

This toolkit include the following commands:

Command	Description	Example	Reference
read	Read dataset into memory. This command by itself is useless, you should chain other command after reading data into memory.	`ai2-kit tool dpdata read ./path/to/dataset --fmt deepmd/npy`	Support wildcard, can be call multiple time
write	Use MultiSystems to merge dataset and write to directory	`ai2-kit tool dpdata read ./path/to/dataset --fmt deepmd/npy - write ./path/to/merged_dataset`
filter	Use lambda expression to filter dataset by system data.	See in `Example`
set_fparam	add `fparam` to dataset, can be float or list of float	See in `Example`
slice	use slice expression to process systems	see in `Example`
sample	sample data by different methods, current supported method are `even` and `random`	see in `Example`
eval	use `deepmd DeepPot` to (re)label loaded data	see in `Example`
to_ase	convert dpdata format to ase format and use ase tool to process	see in `Example`

Those commands are chainable and can be used to process trajectory in a pipeline fashion (separated by -). For more information, please refer to the following examples.

Example#

# read multiple dataset generated by training workflow by wildcard and merge them into a single dataset
# you can also call `read` multiple times to read multiple dataset from different directory
ai2-kit tool dpdata read ./workdir/iters-*/train-deepmd/new_dataset/* --fmt deepmd/npy - write ./merged_dataset  --fmt deepmd/npy

# You can also save data with hdf5 format
ai2-kit tool dpdata read ./workdir/iters-*/train-deepmd/new_dataset/* --fmt deepmd/npy - write ./merged.hdf5 --fmt deepmd/hdf5

# Use lambda expression to filter outlier data
ai2-kit tool dpdata read ./path/to/dataset --fmt deepmd/npy - filter "lambda x: x['forces'].max() < 10" - write ./path/to/filtered_dataset

# Set fparam when reading data
ai2-kit tool dpdata read ./path/to/dataset --fmt deepmd/npy --fparam [0,1] - write ./path/to/new_dataset

# (re)label data
ai2-kit tool dpdata read dp-h2o --nolabel - eval dp-frozen.pb - write new-dp-hwo

# Drop the first 10 frames and then sample 10 frames use random method, and save it as xyz format
ai2-kit tool dpdata read dp-h2o - slice 10: - sample 10 --method random - to_ase - write h2o.xyz