train
The dptools train command is used to setup and optionally submit slurm jobs for training
deepmd-kit models and ensemble of models. It is intended to be used after setting up the
General usage,
$ dptools train [-h] [-e] [-s] [-p PATH] [-i INPUT] dataset
positional arguments:
dataset Path to dataset parent directory
optional arguments:
-h, --help show this help message and exit
-e, --ensemble Make ensemble (4) of DP models to train (default: False)
-s, --submit Automatically submit slurm job(s) to train model(s) (default: False)
-p PATH, --path PATH Specify path to training directory (default: .)
-i INPUT, --input INPUT
Specify path to in.json deepmd parameter file to use for training (default: None)
Quick reference examples
$ dptools train /path/to/dataset # simple single model
$ dptools train -e /path/to/dataset # ensemble (4) of models
$ dptools train -e -s /path/to/dataset # submit 4 slurm jobs to train ensemble
$ dptools train -p /path/to/training/dir /path/to/dataset # specify dir to train in
$ dptools train -i /path/to/in.json /path/to/dataset # specify in.json parameter file
Train single model
The simplest usage of dptools train requires only the path to the parent folder of the
dataset directory generated by the input command. This generates the saved training
parameter in.json file (you can adjust these values using the set command).
$ dptools train ./data
Note
By default, without specifying -s or --submit, the dptools train command
only sets up the required training files. This is because training is typically slow
and most users will never train directly in their shell. However, if you want to do so,
just run the dp train command after running dptools train,
$ dp train in.json
Or by simply running the shell script that dptools generates,
$ bash dptools.train.sh
Train ensemble of models
Including the -e or --ensemble flags will generate four training directories, each with
nearly identical in.json training parameter files – the only difference being the seeds for
initializing the model. The ensemble of models can be used to select new configurations for
training by calculating the max standard deviation of force predicitons between the models.
Configurations with large standard deviations (high uncertainty) typically belong to undersampled
regions of phase space, and thus ideal for including in training. However, care should be taken
when choosing the upper bound on acceptable standard deviations when selecting new configurations,
as excessively high deviations could belong to unphysical configurations that are unlikely
to converge with DFT. Users are highly encouraged to read
this paper for more details.
$ dptools train -e ./data