# Dpdata Toolkit ```bash ai2-kit tool dpdata ``` This toolkit is a command line wrapper of [dpdata](https://github.com/deepmodeling/dpdata) to allow user to process DeepMD dataset via command line. ## Usage ```bash ai2-kit tool dpdata # show all commands ai2-kit tool dpdata to_ase -h # show doc of specific command ``` This toolkit include the following commands: | Command | Description | Example | Reference | | --- | --- | --- | --- | | read | Read dataset into memory. This command by itself is useless, you should chain other command after reading data into memory. | `ai2-kit tool dpdata read ./path/to/dataset --fmt deepmd/npy` | Support wildcard, can be call multiple time | | write | Use MultiSystems to merge dataset and write to directory | `ai2-kit tool dpdata read ./path/to/dataset --fmt deepmd/npy - write ./path/to/merged_dataset` | | | filter | Use lambda expression to filter dataset by system data. | See in `Example` | | | set_fparam | add `fparam` to dataset, can be float or list of float | See in `Example` | | | slice | use slice expression to process systems | see in `Example` | | | sample | sample data by different methods, current supported method are `even` and `random` | see in `Example` | | | eval | use `deepmd DeepPot` to (re)label loaded data | see in `Example` | | | to_ase | convert dpdata format to ase format and use [ase tool](./ase.md) to process | see in `Example` | | Those commands are chainable and can be used to process trajectory in a pipeline fashion (separated by `-`). For more information, please refer to the following examples. ## Example ```bash # read multiple dataset generated by training workflow by wildcard and merge them into a single dataset # you can also call `read` multiple times to read multiple dataset from different directory ai2-kit tool dpdata read ./workdir/iters-*/train-deepmd/new_dataset/* --fmt deepmd/npy - write ./merged_dataset --fmt deepmd/npy # You can also save data with hdf5 format ai2-kit tool dpdata read ./workdir/iters-*/train-deepmd/new_dataset/* --fmt deepmd/npy - write ./merged.hdf5 --fmt deepmd/hdf5 # Use lambda expression to filter outlier data ai2-kit tool dpdata read ./path/to/dataset --fmt deepmd/npy - filter "lambda x: x['forces'].max() < 10" - write ./path/to/filtered_dataset # Set fparam when reading data ai2-kit tool dpdata read ./path/to/dataset --fmt deepmd/npy --fparam [0,1] - write ./path/to/new_dataset # (re)label data ai2-kit tool dpdata read dp-h2o --nolabel - eval dp-frozen.pb - write new-dp-hwo # Drop the first 10 frames and then sample 10 frames use random method, and save it as xyz format ai2-kit tool dpdata read dp-h2o - slice 10: - sample 10 --method random - to_ase - write h2o.xyz ```