Dpdata Toolkit#

ai2-kit tool dpdata

This toolkit is a command line wrapper of dpdata to allow user to process DeepMD dataset via command line.

Usage#

ai2-kit tool dpdata # show all commands
ai2-kit tool dpdata to_ase -h  # show doc of specific command

This toolkit include the following commands:

Command

Description

Example

Reference

read

Read dataset into memory. This command by itself is useless, you should chain other command after reading data into memory.

ai2-kit tool dpdata read ./path/to/dataset --fmt deepmd/npy

Support wildcard, can be call multiple time

write

Use MultiSystems to merge dataset and write to directory

ai2-kit tool dpdata read ./path/to/dataset --fmt deepmd/npy - write ./path/to/merged_dataset

filter

Use lambda expression to filter dataset by system data.

See in Example

set_fparam

add fparam to dataset, can be float or list of float

See in Example

slice

use slice expression to process systems

see in Example

sample

sample data by different methods, current supported method are even and random

see in Example

eval

use deepmd DeepPot to (re)label loaded data

see in Example

to_ase

convert dpdata format to ase format and use ase tool to process

see in Example

Those commands are chainable and can be used to process trajectory in a pipeline fashion (separated by -). For more information, please refer to the following examples.

Example#

# read multiple dataset generated by training workflow by wildcard and merge them into a single dataset
# you can also call `read` multiple times to read multiple dataset from different directory
ai2-kit tool dpdata read ./workdir/iters-*/train-deepmd/new_dataset/* --fmt deepmd/npy - write ./merged_dataset  --fmt deepmd/npy

# You can also save data with hdf5 format
ai2-kit tool dpdata read ./workdir/iters-*/train-deepmd/new_dataset/* --fmt deepmd/npy - write ./merged.hdf5 --fmt deepmd/hdf5

# Use lambda expression to filter outlier data
ai2-kit tool dpdata read ./path/to/dataset --fmt deepmd/npy - filter "lambda x: x['forces'].max() < 10" - write ./path/to/filtered_dataset

# Set fparam when reading data
ai2-kit tool dpdata read ./path/to/dataset --fmt deepmd/npy --fparam [0,1] - write ./path/to/new_dataset

# (re)label data
ai2-kit tool dpdata read dp-h2o --nolabel - eval dp-frozen.pb - write new-dp-hwo

# Drop the first 10 frames and then sample 10 frames use random method, and save it as xyz format
ai2-kit tool dpdata read dp-h2o - slice 10: - sample 10 --method random - to_ase - write h2o.xyz