input
The dptools input command is used to create deepmd-kit datasets to train a model
from DFT (or some other ab-initio method) calculations.
General usage,
$ dptools input [-h] [-n N] [-p PATH] [-a] input_file [input_file ...]
positional arguments:
input_file .db, .traj, or vasprun.xml files
optional arguments:
-h, --help show this help message and exit
-n N Max number of images to take from each db (default: None)
-p PATH, --path PATH Specify path to dataset directory (default: ./data)
-a, --append Append to dataset if system already exists in dataset directory (default: False)
Quick reference examples
$ dptools input 00_system1.db 00_system2.db
$ dptools input 0*_sys*.db # equivalent to above
$ dptools input 00_system1/vasprun.xml 00_system2/vasprun.xml
$ dptools input -p /path/to/dataset/folder 0*.db
$ dptools input -a -p /path/to/dataset/folder 0*.db
Create dataset from VASP/ASE results
The following file types are supported by dptools input:
|
|
|
First, each unique system (defined as having identical number of atoms and identical indexing for all images) should be given saved as a separate file and given a unique and descriptive name (e.g., iter0_SOD_5H2O.db). The deepmd dataset can then be created with,
$ dptools input /path/to/DFT_data/*.db
Note
If vasprun.xml files from DFT-MD calculations are used, place the vasprun.xml in a separate folder with the name of the system.
This will create training (80%), validation (10%), and testing (10%) datasets in your current directory in a folder named data by default.
Warning
By default, training sets are overwritten if an input has the same name as an existing
system in the dataset folder. To append new images to existing datasets, include the -a
or --append flag when running the command. Note that it is generally a good idea to
create a new system name each time you add more data when training iteratively (e.g.,
iter1_system1.db, iter2_system1.db). This allows for easy comparison of different iteration
datasets when making parity plots.
You can also specify the location of the dataset folder with -p or --path if you do
not wish to create the dataset in ./data. E.g.,
$ dptools input -p /path/to/dataset_directory *.db