dptools.train package

class dptools.train.DeepInput(atoms_file, atoms=None, system_name=None, type_map=None, append=False, n=None, path='./data')

Bases: object

Class for writing training set compatible with deepmd-kit from ASE/VASP output (.traj, .db, vasprun.xml, etc.).

Parameters:

atoms_file (str) – File containing training set configurations (e.g., .db).
atoms (ase.Atoms or None) – Optional Atoms object to use for assigning atom types. First image in atoms_file is used if None specified.
system_name (str) – Descriptive name of atomic system to use as directory name in dataset folder.
type_map (dict or None) – Dictionary mapping each atomic symbol to corresponding atom type index, optional and alphabetic order if used if None specified. E.g., {‘Si’: 0, ‘O’: 1}.
n (int) – Max number of images to take from atoms_file. All images are randomly shuffled and then n are taken for training.
path (str) – Path to dataset parent folder, makes folder if doesn’t already exist.
append (bool) – If True, appends new configurations to current dataset if system_name dataset already exists.

set_dataset()

write_input()

write_npy_set(dataset, indices)

write_types()

class dptools.train.DeepInputs(db_names, atoms=None, system_names=None, type_map=None, append=False, n=None, in_json=None, path='./data')

Bases: object

Class for writing training set compatible with deepmd-kit from multiple ASE/VASP outputs (.traj, .db, vasprun.xml, etc.).

Parameters:

db_names (str) – File containing training set configurations (e.g., .db, .traj). TODO: update variable name to something more generic
atoms (ase.Atoms or None) – Optional Atoms object to use for assigning atom types. First image in atoms_file is used if None specified.
system_names (list[str]) – Descriptive name of atomic systems to use as directory names in dataset folder. Each name should be descriptive (e.g. ‘00_sodalite_10h2o’) and unique for each db_names item, i.e., len(system_names) must equal len(db_names)
type_map (dict or None) – Dictionary mapping each atomic symbol to corresponding atom type index, optional and alphabetic order if used if None specified. E.g., {‘Si’: 0, ‘O’: 1}.
n (int) – Max number of images to take from atoms_file. All images are randomly shuffled and then n are taken for training.
path (str) – Path to dataset parent folder, makes folder if doesn’t already exist.

get_atoms(db_names)

get_type_map(atoms)

Check Atoms object from all systems for unique symbols, and then assigns type_map in alphabetical order.

Parameters:: atoms (list[ase.Atoms]) – List of example Atoms object from all systems that will be used for training (e.g. the first image from all vasprun inputs).
Returns:: type_map (dict) – Dictionary mapping each atom type index to atomic symbol in alphabetical order.

set_json()

set_systems()

update_json()

write_json()

class dptools.train.EvaluateDP(test_sets, dp_graph='graph.pb', per_atom=False)

Bases: object

Class to read deepmd test sets (created with CLI input command or DeepInput) and create parity plots for DP predictions.

Parameters:

test_sets (list[str] or str) – Paths to deepmd test set folders. E.g., 'data/system1/test/set.000' # TODO: Add support for other input types (.traj with vasp calculators, etc.)
dp_graph (str) – Path to deepmd model to use for DP predictions.
per_atom (bool) – If True, normalize all energies per number of atoms. False uses raw energies for parity plot and loss function evaluations.

evaluate(test_set)

static get_mae(data)

static get_mse(data)

static get_rmse(data)

plot(loss='mse', axs=None, xyz=False, fancy=False, save_file=None, rasterized=False)

Plot energy, force, and virial (if available) parity plots for DP predictions.

Parameters:

loss (str) – {mse,mae,rmse} type of loss/error function to use for parity plot error annotation.
axs (list[matplotlib.pyplot.Axes]) – Optional, specific axes objects to plot on.
xyz (bool) – If True, plot all xyz force components in separate parity plots. False plots all components together (recommended, usually pointless to separate).
fancy (bool) – If True, create fancy density parity plot for force predictions.

plot_parity(data, label, color, loss='mse', ax=None, fancy=False, rasterized=False)

static plot_yx(dft, ax)

set_mse()

class dptools.train.SampleConfigs(configs, graphs, type_map=None, indices=':')

Bases: object

Class for selecting new training configurations from snapshots of a molecular dynamnics trajectory (or some similar method). Uses an ensemble of models to calculate eps_t (max force prediction deviation), and then select new configurations within some specified tolerance such that new configurations belong to realistic but unexplored regions of configuration space.

Follows the guidelines described by DP-GEN (full credit to the authors). For more details on eps_t and the methodologies used here, refer to:

Y. Zhang, H. Wang, W. Chen, J. Zeng, L. Zhang, H. Wang and W. E, Comput. Phys. Commun., 2020, 253, 107206.

Please cite the above reference if you use this script for training models in published work. Also, check out the DP-GEN GitHub for a robust, standalone package for automatically and efficiently training deepmd-kit MLPs.

Parameters:

configs (list[ase.Atoms] or str) – List of atomic configurations from MD trajectory (or str to .traj, .xyz, etc. that contains configs) to sample from.
graphs (list[str]) – List of paths to ensemble of deepmd models (.pb files). e.g., [‘00/graph.pb’, ‘01/graph.pb’, ‘02/graph.pb’, ‘03/graph.pb’]
type_map (dict, optional) – Dictionary mapping each atom type (symbol) to corresponding index. If None specified, infer from graph file.
indices (str) – Index slice to use for reading configs if str input supplied (used in command ase.io.read(config, index=indices)).

get_dev()

plot(dev=None, steps=False, ax=None, color=None, label=None)

Create kernel density estimation plot (fancy smooth histogram) for all eps_t values in configs input.

Requires seaborn python package to be installed! https://seaborn.pydata.org

Parameters:

dev (array-like) – Optional list of dev values to plot, calculate if None given.
steps (bool) – If False, plot histogram of dev values. If True, plot dev versus step number (divided by write_freq).
ax (Axes) – Specific mpl Axes to plot on.
color (str) – Color to use for plot.
label (str) – Label to use for legend entry.

sample(lo=0.05, hi=0.35, n=300)

Select n new training configurations with lo < eps_t < hi.

Note

hi must be chosen carefully to allow for the selection of configurations that belong to underexplored regions of configuration space, but not set so high that nonsensical, unphysical configs are chosen.

Parameters:

lo (float) – Lower bound eps_t tolerance for sampling.
hi (float) – Upper bound eps_t tolerance for sampling.
n (int) – Maximum number of configurations to select. If the number of configs within lo and hi is < n, then all configs between lo and hi are returned.

Returns:

new_configs (list[ase.Atoms]) – Configurations with eps_t within the specified tolerance criteria.

Submodules

dptools.train.ensemble module

Module for working with ensembles of DP models.

class dptools.train.ensemble.SampleConfigs(configs, graphs, type_map=None, indices=':')

Bases: object

Class for selecting new training configurations from snapshots of a molecular dynamnics trajectory (or some similar method). Uses an ensemble of models to calculate eps_t (max force prediction deviation), and then select new configurations within some specified tolerance such that new configurations belong to realistic but unexplored regions of configuration space.

Follows the guidelines described by DP-GEN (full credit to the authors). For more details on eps_t and the methodologies used here, refer to:

Y. Zhang, H. Wang, W. Chen, J. Zeng, L. Zhang, H. Wang and W. E, Comput. Phys. Commun., 2020, 253, 107206.

Please cite the above reference if you use this script for training models in published work. Also, check out the DP-GEN GitHub for a robust, standalone package for automatically and efficiently training deepmd-kit MLPs.

Parameters:

configs (list[ase.Atoms] or str) – List of atomic configurations from MD trajectory (or str to .traj, .xyz, etc. that contains configs) to sample from.
graphs (list[str]) – List of paths to ensemble of deepmd models (.pb files). e.g., [‘00/graph.pb’, ‘01/graph.pb’, ‘02/graph.pb’, ‘03/graph.pb’]
type_map (dict, optional) – Dictionary mapping each atom type (symbol) to corresponding index. If None specified, infer from graph file.
indices (str) – Index slice to use for reading configs if str input supplied (used in command ase.io.read(config, index=indices)).

get_dev()

plot(dev=None, steps=False, ax=None, color=None, label=None)

Create kernel density estimation plot (fancy smooth histogram) for all eps_t values in configs input.

Requires seaborn python package to be installed! https://seaborn.pydata.org

Parameters:

dev (array-like) – Optional list of dev values to plot, calculate if None given.
steps (bool) – If False, plot histogram of dev values. If True, plot dev versus step number (divided by write_freq).
ax (Axes) – Specific mpl Axes to plot on.
color (str) – Color to use for plot.
label (str) – Label to use for legend entry.

sample(lo=0.05, hi=0.35, n=300)

Select n new training configurations with lo < eps_t < hi.

Note

hi must be chosen carefully to allow for the selection of configurations that belong to underexplored regions of configuration space, but not set so high that nonsensical, unphysical configs are chosen.

Parameters:

lo (float) – Lower bound eps_t tolerance for sampling.
hi (float) – Upper bound eps_t tolerance for sampling.
n (int) – Maximum number of configurations to select. If the number of configs within lo and hi is < n, then all configs between lo and hi are returned.

Returns:

new_configs (list[ase.Atoms]) – Configurations with eps_t within the specified tolerance criteria.

dptools.train.input module

Module for writing deepmd training sets from ab-initio calculation results.

class dptools.train.input.DeepInput(atoms_file, atoms=None, system_name=None, type_map=None, append=False, n=None, path='./data')

Bases: object

Class for writing training set compatible with deepmd-kit from ASE/VASP output (.traj, .db, vasprun.xml, etc.).

Parameters:

atoms_file (str) – File containing training set configurations (e.g., .db).
atoms (ase.Atoms or None) – Optional Atoms object to use for assigning atom types. First image in atoms_file is used if None specified.
system_name (str) – Descriptive name of atomic system to use as directory name in dataset folder.
type_map (dict or None) – Dictionary mapping each atomic symbol to corresponding atom type index, optional and alphabetic order if used if None specified. E.g., {‘Si’: 0, ‘O’: 1}.
n (int) – Max number of images to take from atoms_file. All images are randomly shuffled and then n are taken for training.
path (str) – Path to dataset parent folder, makes folder if doesn’t already exist.
append (bool) – If True, appends new configurations to current dataset if system_name dataset already exists.

set_dataset()

write_input()

write_npy_set(dataset, indices)

write_types()

class dptools.train.input.DeepInputs(db_names, atoms=None, system_names=None, type_map=None, append=False, n=None, in_json=None, path='./data')

Bases: object

Class for writing training set compatible with deepmd-kit from multiple ASE/VASP outputs (.traj, .db, vasprun.xml, etc.).

Parameters:

db_names (str) – File containing training set configurations (e.g., .db, .traj). TODO: update variable name to something more generic
atoms (ase.Atoms or None) – Optional Atoms object to use for assigning atom types. First image in atoms_file is used if None specified.
system_names (list[str]) – Descriptive name of atomic systems to use as directory names in dataset folder. Each name should be descriptive (e.g. ‘00_sodalite_10h2o’) and unique for each db_names item, i.e., len(system_names) must equal len(db_names)
type_map (dict or None) – Dictionary mapping each atomic symbol to corresponding atom type index, optional and alphabetic order if used if None specified. E.g., {‘Si’: 0, ‘O’: 1}.
n (int) – Max number of images to take from atoms_file. All images are randomly shuffled and then n are taken for training.
path (str) – Path to dataset parent folder, makes folder if doesn’t already exist.

get_atoms(db_names)

get_type_map(atoms)

Check Atoms object from all systems for unique symbols, and then assigns type_map in alphabetical order.

Parameters:: atoms (list[ase.Atoms]) – List of example Atoms object from all systems that will be used for training (e.g. the first image from all vasprun inputs).
Returns:: type_map (dict) – Dictionary mapping each atom type index to atomic symbol in alphabetical order.

set_json()

set_systems()

update_json()

write_json()

dptools.train.parity module

Module to generate parity plots to compare DP model predictions with corresponding ab-initio values.

class dptools.train.parity.EvaluateDP(test_sets, dp_graph='graph.pb', per_atom=False)

Bases: object

Class to read deepmd test sets (created with CLI input command or DeepInput) and create parity plots for DP predictions.

Parameters:

test_sets (list[str] or str) – Paths to deepmd test set folders. E.g., 'data/system1/test/set.000' # TODO: Add support for other input types (.traj with vasp calculators, etc.)
dp_graph (str) – Path to deepmd model to use for DP predictions.
per_atom (bool) – If True, normalize all energies per number of atoms. False uses raw energies for parity plot and loss function evaluations.

evaluate(test_set)

static get_mae(data)

static get_mse(data)

static get_rmse(data)

plot(loss='mse', axs=None, xyz=False, fancy=False, save_file=None, rasterized=False)

Plot energy, force, and virial (if available) parity plots for DP predictions.

Parameters:

loss (str) – {mse,mae,rmse} type of loss/error function to use for parity plot error annotation.
axs (list[matplotlib.pyplot.Axes]) – Optional, specific axes objects to plot on.
xyz (bool) – If True, plot all xyz force components in separate parity plots. False plots all components together (recommended, usually pointless to separate).
fancy (bool) – If True, create fancy density parity plot for force predictions.

plot_parity(data, label, color, loss='mse', ax=None, fancy=False, rasterized=False)

static plot_yx(dft, ax)

set_mse()

dptools.train.parity.density_scatter(x, y, ax=None, bins=300, **kwargs)

Plot fancy density parity plot. Requires scipy package!

Parameters:

x (array-like) – x-axis values to plot.
y (array-like) – y-axis values to plot.
ax (matplotlib.axes.Axes) – Axes object to plot on.
bins (int) – Number of bins to partition off x y values into. Passed to np.histogram2d(bins=bins).
**kwargs – Any additional keyword-args for matplotlib.pyplot.scatter().