Use ai2-spectr to Predict IR and Raman Spectrum on HPC#

ai2-kit feat spectr viber

Introduction#

The ai2-spectr toolkit is a comprehensive suite tailored for the prediction and analysis of spectra. The viber toolkit, embedded within, specializes in the prediction of vibrational spectra, encompassing both infrared and Raman spectra. Unlike furnishing a fully integrated computational workflow, the toolkit presents an array of tools for job generation, dataset creation, data visualization, and similar functions. Users can effortlessly leverage these tools to construct workflows operational on High-Performance Computing (HPC) systems. This documentation is intended to elucidate the usage of the toolkit, employing concrete examples for a thorough understanding.

Case Study#

Build a shell workflow to generate dipole and polarizability dataset#

This case study illustrates the construction of a workflow using a straightforward shell script. The workflow, driven by input system and CP2K configuration files, autonomously generates jobs. These jobs are then submitted for execution on an HPC cluster, facilitating the production of the dataset essential for training in Deepmd-kit.

To get started, you need to prepare the following files:

  • System files for CP2K: They can be any format that supported by ase.io, for example Al3O2.xyz.

  • CP2K input files: The CP2K input files for the system. You need to prepare a configuration for dipole calculation and 3 configurations for polarizability calculation. For example, cp2k-dipole.inp and cp2k-polar-x.inp, cp2k-polar-y.inp, cp2k-polar-z.inp.

    • Note that you need to ensure the file names of the wannier output files are different in different CP2K configurations. A good conventions is to named them as wannier.xyz and wannier_x.xyz, wannier_y.xyz, wannier_z.xyz.

    • You should use @include coord_n_cell.inc to include the coordinate and cell section in the CP2K input files.

A good practice is to create a working directory and put all the files in it. For example, you can create a directory named al3o2 and put all the files in it.

Now we can start to build shell script workflow.sh to automate the calculation.

#!/bin/bash
set -e

# generate slurm template for CP2K
cat << EOF > slurm-cp2k.template
#!/bin/bash
#SBATCH -J cp2k
#SBATCH -N 1
#SBATCH -n 24
#SBATCH -p cpu-large
#SBATCH --mem-per-cpu=4G

module load cp2k/7.1
EOF

# generate CP2K labeling tasks
# note that the tag decide the output file name, 
# for example, --tag dipole's output file will be dipole.out
ai2-kit feat spectr viber cp2k-labeling \
- add_system Al3O2.xyz \
- add_cp2k_input cp2k-dipole.inp --tag dipole \
- add_cp2k_input cp2k-polar-x.inp --tag polar-x \
- add_cp2k_input cp2k-polar-y.inp --tag polar-y \
- add_cp2k_input cp2k-polar-z.inp --tag polar-z \
- make_tasks ./run-01 \
- make_scripts "./run-01/cpk2-batch-{i:02d}.sub" \
--concurrency 5 \
--cp2k_cmd "mpirun cp2k.popt" \
--template slurm-cp2k.template \
- done

# show generated file
find ./run-01 

# submit batch scripts and wait for completion
ai2-kit tool hpc slurm submit ./run-01/cpk2-batch-*.sub - wait

# generate dataset
ai2-kit tool dpdata read run-01/* --fmt cp2k/viber --lumped_dict '{O:4}' --output_file dipole.out \
  --wannier wannier.xyz --wannier_x wannier_x.xyz --wannier_y wannier_y.xyz --wannier_z wannier_z.xyz \
  - write ./al2o3-dataset

This script may take a long time to finish, you can run it with nohup and use & to put it in background. For example

nohup bash workflow.sh &>> workflow.log &

Note that you can view the help document of each command by running with -h option, for example

ai2-kit feat spectr viber cp2k-labeling - add_system -h

After the script finished, you can find the generated dataset in al2o3-dataset directory.

Training Deepmd-kit model#

Predict Dipole and Polarizability#

TODO

Calculate IR and Raman Spectrum#

TODO