train

The dptools train command is used to setup and optionally submit slurm jobs for training deepmd-kit models and ensemble of models. It is intended to be used after setting up the

General usage,

$ dptools train [-h] [-e] [-s] [-p PATH] [-i INPUT] dataset

positional arguments:
  dataset               Path to dataset parent directory

optional arguments:
  -h, --help            show this help message and exit
  -e, --ensemble        Make ensemble (4) of DP models to train (default: False)
  -s, --submit          Automatically submit slurm job(s) to train model(s) (default: False)
  -p PATH, --path PATH  Specify path to training directory (default: .)
  -i INPUT, --input INPUT
                        Specify path to in.json deepmd parameter file to use for training (default: None)

Quick reference examples

$ dptools train /path/to/dataset # simple single model
$ dptools train -e /path/to/dataset # ensemble (4) of models
$ dptools train -e -s /path/to/dataset # submit 4 slurm jobs to train ensemble
$ dptools train -p /path/to/training/dir /path/to/dataset # specify dir to train in
$ dptools train -i /path/to/in.json /path/to/dataset # specify in.json parameter file

Train single model

The simplest usage of dptools train requires only the path to the parent folder of the dataset directory generated by the input command. This generates the saved training parameter in.json file (you can adjust these values using the set command).

$ dptools train ./data

Note

By default, without specifying -s or --submit, the dptools train command only sets up the required training files. This is because training is typically slow and most users will never train directly in their shell. However, if you want to do so, just run the dp train command after running dptools train,

$ dp train in.json

Or by simply running the shell script that dptools generates,

$ bash dptools.train.sh

Train ensemble of models

Including the -e or --ensemble flags will generate four training directories, each with nearly identical in.json training parameter files – the only difference being the seeds for initializing the model. The ensemble of models can be used to select new configurations for training by calculating the max standard deviation of force predicitons between the models. Configurations with large standard deviations (high uncertainty) typically belong to undersampled regions of phase space, and thus ideal for including in training. However, care should be taken when choosing the upper bound on acceptable standard deviations when selecting new configurations, as excessively high deviations could belong to unphysical configurations that are unlikely to converge with DFT. Users are highly encouraged to read this paper for more details.

$ dptools train -e ./data