Models

The models are the classes used to build, train and test a Gaussian process, and to then build the relative mapped potential. There are six types of models at the moment, each one is used to handle 2-, 3-, or 2+3-body kernels in the case of one or two atomic species. When creating a model, it is therefore necessary to decide a priori the type of Gaussian process and, therefore, the type of mapped potential we want to obtain.

Building a model

To create a model based on a 2-body kernel for a monoatomic system:

from mff import models
mymodel = models.TwoBodySingleSpecies(atomic_number, cutoff_radius, sigma, theta, noise)

where the parameters refer to the atomic number of the species we are training the GP on, the cutoff radius we want to use, the lengthscale hyperparameter of the Gaussian Process, the hyperparameter governing the exponential decay of the cutoff function, and the noise associated with the output training data. In the case of a 2+3-body kernel for a monoatomic system:

from mff import models
mymodel = models.CombinedSingleSpecies(atomic_number, cutoff_radius, sigma_2b, theta_2b, sigma_3b, theta_3b, noise)

where we have two additional hyperparameters since the lengthscale value and the cutoff decay ratio of the 2- and 3-body kernels contained inside the combined Gaussian Process are be independent.

When dealing with a two-element system, the syntax is very similar, but the atomic_number is instead a list containing the atomic numbers of the two species, in increasing order:

from mff import models
mymodel = models.CombinedTwoSpecies(atomic_numbers, cutoff_radius, sigma_2b, theta_2b, sigma_3b, theta_3b, noise)

Fitting the model

Once a model has been built, we can train it using a dataset of forces, energies, or energies and forces, that has been created using the configurations module. If we are training only on forces:

mymodel.fit(training_confs, training_forces)

training only on energies:

mymodel.fit_energy(training_confs, training_energies)

training on both forces and energies:

mymodel.fit_force_and_energy(training_confs, training_forces, training_energies)

Additionaly, the argument nnodes can be passed to any fit function in order to run the process on multiple processors:

mymodel.fit(training_confs, training_forces, nnodes = 4)

Predicting forces and energies with the GP

Once the Gaussian process has been fitted, it can be used to directly predict forces and energies on test configurations. To predict the force and the energy for a single test configuration:

force = mymodel.predict(test_configuration)
energy = mymodel.predict_energy(test_configuration)

the boolean variable return_std can be passed to the force and energy predict functions in order to obtain also the standard deviation associated with the prediction, default is False:

mean_force, std_force = mymodel.predict(test_configuration, return_std = True)

Building a mapped potential

Once the Gaussian process has been fitted, either via force, energy or joint force/energy fit, it can be mapped onto a non-parametric 2- and/or 3-body potential using the build_grid function. The build_grid function takes as arguments the minimum grid distance (smallest distance between atoms for which the potential will be defined), the number of grid points to use while building the 2-body mapped potential, and the number of points per dimension to use while building the 3-body mapped potential. For a 2-body model:

mymodel.build_grid(grid start, num_2b)

For a 3-body model:

mymodel.build_grid(grid start, num_3b)

For a combined model:

mymodel.build_grid(grid start, num_2b, num_3b)

Additionaly, the argument nnodes can be passed to the build_grid function for any model in order to run the process on multiple processors:

mymodel.build_grid(grid start, num_2b, num_3b, nnodes = 4)

Saving and loading a model

At any stage, a model can be saved using the save function that takes a .json filename as the only input:

mymodel.save("thismodel.json")

the save function will create a .json file containing all of the parameters and hyperparameters of the model, and the paths to the .npy and .npz files containing, respectively, the saved GPs and the saved mapped potentials, which are also created by the save funtion.

To load a previously saved model of a known type (here for example a CombinedSingleSpecies model) simply run:

mymodel = models.CombinedSingleSpecies.from_json("thismodel.json")

Model’s complete reference

Two Body Model

Module containing the TwoBodySingleSpecies and TwoBodyTwoSpecies classes, which are used to handle the Gaussian process and the mapping algorithm used to build M-FFs. The model has to be first defined, then the Gaussian process must be trained using training configurations and forces (and/or energies). Once a model has been trained, it can be used to predict forces (and/or energies) on unknonwn atomic configurations. A trained Gaussian process can then be mapped onto a tabulated 2-body potential via the build grid function call. A mapped model can be then saved, loaded and used to run molecular dynamics simulations via the calculator module. These mapped potentials retain the accuracy of the GP used to build them, while speeding up the calculations by a factor of 10^4 in typical scenarios.

Example:

from mff import models
mymodel = models.TwoBodySingleSpecies(atomic_number, cutoff_radius, sigma, theta, noise)
mymodel.fit(training_confs, training_forces)

forces = mymodel.predict(test_configurations)

mymodel.build_grid(grid_start, num_2b)
mymodel.save("thismodel.json")

mymodel = models.TwoBodySingleSpecies.from_json("thismodel.json")
class mff.models.twobody.NpEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)
default(obj)

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return JSONEncoder.default(self, o)
class mff.models.twobody.TwoBodyManySpeciesModel(elements, r_cut, sigma, theta, noise, rep_sig=1, **kwargs)

2-body many species model class Class managing the Gaussian process and its mapped counterpart

Parameters:
  • elements (list) – List containing the atomic numbers in increasing order
  • r_cut (foat) – The cutoff radius used to carve the atomic environments
  • sigma (foat) – Lengthscale parameter of the Gaussian process
  • theta (float) – decay ratio of the cutoff function in the Gaussian Process
  • noise (float) – noise value associated with the training output data
gp

The 2-body two species Gaussian Process

Type:class
grid

Contains the three 2-body two species tabulated potentials, accounting for interactions between two atoms of types 0-0, 0-1, and 1-1.

Type:list
grid_start

Minimum atomic distance for which the grid is defined (cannot be 0)

Type:float
grid_num

number of points used to create the 2-body grids

Type:int
build_grid(start, num, ncores=1)

Build the mapped 2-body potential. Calculates the energy predicted by the GP for two atoms at distances that range from start to r_cut, for a total of num points. These energies are stored and a 1D spline interpolation is created, which can be used to predict the energy and, through its analytic derivative, the force associated to any couple of atoms. The total force or local energy can then be calculated for any atom by summing the pairwise contributions of every other atom within a cutoff distance r_cut. Three distinct potentials are built for interactions between atoms of type 0 and 0, type 0 and 1, and type 1 and 1. The prediction is done by the calculator module which is built to work within the ase python package.

Parameters:
  • start (float) – smallest interatomic distance for which the energy is predicted by the GP and stored inn the 2-body mapped potential
  • num (int) – number of points to use in the grid of the mapped potential
fit(confs, forces, ncores=1)

Fit the GP to a set of training forces using a 2-body single species force-force kernel

Parameters:
  • confs (list) – List of M x 5 arrays containing coordinates and atomic numbers of atoms within a cutoff from the central one
  • forces (array) – Array containing the vector forces on the central atoms of the training configurations
  • ncores (int) – number of CPUs to use for the gram matrix evaluation
fit_energy(glob_confs, energies, ncores=1)

Fit the GP to a set of training energies using a 2-body single species energy-energy kernel

Parameters:
  • glob_confs (list of lists) – List of configurations arranged so that grouped configurations belong to the same snapshot
  • energies (array) – Array containing the total energy of each snapshot
  • ncores (int) – number of CPUs to use for the gram matrix evaluation
fit_force_and_energy(confs, forces, glob_confs, energies, ncores=1)

Fit the GP to a set of training forces and energies using 2-body single species force-force, energy-force and energy-energy kernels

Parameters:
  • confs (list) – List of M x 5 arrays containing coordinates and atomic numbers of atoms within a cutoff from the central one
  • forces (array) – Array containing the vector forces on the central atoms of the training configurations
  • glob_confs (list of lists) – List of configurations arranged so that grouped configurations belong to the same snapshot
  • energies (array) – Array containing the total energy of each snapshot
  • ncores (int) – number of CPUs to use for the gram matrix evaluation
classmethod from_json(path)

Load the models. Loads the model, the associated GP and the mapped potential, if available.

Parameters:path (str) – path to the .json model file
Returns:the model object
Return type:model (obj)
load_gp(filename)

Loads the GP object, now obsolete

predict(confs, return_std=False, ncores=1)

Predict the forces acting on the central atoms of confs using a GP

Parameters:
  • confs (list) – List of M x 5 arrays containing coordinates and atomic numbers of atoms within a cutoff from the central one
  • return_std (bool) – if True, returns the standard deviation associated to predictions according to the GP framework
Returns:

array of force vectors predicted by the GP forces_errors (array): errors associated to the force predictions,

returned only if return_std is True

Return type:

forces (array)

predict_energy(glob_confs, return_std=False, ncores=1)

Predict the global energies of the central atoms of confs using a GP

Parameters:
  • glob_confs (list of lists) – List of configurations arranged so that grouped configurations belong to the same snapshot
  • return_std (bool) – if True, returns the standard deviation associated to predictions according to the GP framework
Returns:

Array containing the total energy of each snapshot energies_errors (array): errors associated to the energies predictions,

returned only if return_std is True

Return type:

energies (array)

save(path)

Save the model. This creates a .json file containing the parameters of the model and the paths to the GP objects and the mapped potentials, which are saved as separate .gpy and .gpz files, respectively.

Parameters:path (str) – path to the file
save_gp(filename)

Saves the GP object, now obsolete

class mff.models.twobody.TwoBodySingleSpeciesModel(element, r_cut, sigma, theta, noise, rep_sig=1, **kwargs)

2-body single species model class Class managing the Gaussian process and its mapped counterpart

Parameters:
  • element (int) – The atomic number of the element considered
  • r_cut (foat) – The cutoff radius used to carve the atomic environments
  • sigma (foat) – Lengthscale parameter of the Gaussian process
  • theta (float) – decay ratio of the cutoff function in the Gaussian Process
  • noise (float) – noise value associated with the training output data
gp

The 2-body single species Gaussian Process

Type:method
grid

The 2-body single species tabulated potential

Type:method
grid_start

Minimum atomic distance for which the grid is defined (cannot be 0.0)

Type:float
grid_num

number of points used to create the 2-body grid

Type:int
build_grid(start, num, ncores=1)

Build the mapped 2-body potential. Calculates the energy predicted by the GP for two atoms at distances that range from start to r_cut, for a total of num points. These energies are stored and a 1D spline interpolation is created, which can be used to predict the energy and, through its analytic derivative, the force associated to any couple of atoms. The total force or local energy can then be calculated for any atom by summing the pairwise contributions of every other atom within a cutoff distance r_cut. The prediction is done by the calculator module which is built to work within the ase python package.

Parameters:
  • start (float) – smallest interatomic distance for which the energy is predicted by the GP and stored inn the 2-body mapped potential
  • num (int) – number of points to use in the grid of the mapped potential
fit(confs, forces, ncores=1)

Fit the GP to a set of training forces using a 2-body single species force-force kernel

Parameters:
  • confs (list) – List of M x 5 arrays containing coordinates and atomic numbers of atoms within a cutoff from the central one
  • forces (array) – Array containing the vector forces on the central atoms of the training configurations
  • ncores (int) – number of CPUs to use for the gram matrix evaluation
fit_energy(glob_confs, energies, ncores=1)

Fit the GP to a set of training energies using a 2-body single species energy-energy kernel

Parameters:
  • glob_confs (list of lists) – List of configurations arranged so that grouped configurations belong to the same snapshot
  • energies (array) – Array containing the total energy of each snapshot
  • ncores (int) – number of CPUs to use for the gram matrix evaluation
fit_force_and_energy(confs, forces, glob_confs, energies, ncores=1)

Fit the GP to a set of training forces and energies using 2-body single species force-force, energy-force and energy-energy kernels

Parameters:
  • confs (list) – List of M x 5 arrays containing coordinates and atomic numbers of atoms within a cutoff from the central one
  • forces (array) – Array containing the vector forces on the central atoms of the training configurations
  • glob_confs (list of lists) – List of configurations arranged so that grouped configurations belong to the same snapshot
  • energies (array) – Array containing the total energy of each snapshot
  • ncores (int) – number of CPUs to use for the gram matrix evaluation
classmethod from_json(path)

Load the model. Loads the model, the associated GP and the mapped potential, if available.

Parameters:path (str) – path to the .json model file
Returns:the model object
Return type:model (obj)
load_gp(filename)

Loads the GP object, now obsolete

predict(confs, return_std=False, ncores=1)

Predict the forces acting on the central atoms of confs using a GP

Parameters:
  • confs (list) – List of M x 5 arrays containing coordinates and atomic numbers of atoms within a cutoff from the central one
  • return_std (bool) – if True, returns the standard deviation associated to predictions according to the GP framework
Returns:

array of force vectors predicted by the GP forces_errors (array): errors associated to the force predictions,

returned only if return_std is True

Return type:

forces (array)

predict_energy(glob_confs, return_std=False, ncores=1)

Predict the global energies of the central atoms of confs using a GP

Parameters:
  • glob_confs (list of lists) – List of configurations arranged so that grouped configurations belong to the same snapshot
  • return_std (bool) – if True, returns the standard deviation associated to predictions according to the GP framework
Returns:

Array containing the total energy of each snapshot energies_errors (array): errors associated to the energies predictions,

returned only if return_std is True

Return type:

energies (array)

save(path)

Save the model. This creates a .json file containing the parameters of the model and the paths to the GP objects and the mapped potential, which are saved as separate .gpy and .gpz files, respectively.

Parameters:path (str) – path to the file
save_gp(filename)

Saves the GP object, now obsolete

Three Body Model

Module containing the ThreeBodySingleSpecies and ThreeBodyTwoSpecies classes, which are used to handle the Gaussian process regression and the mapping algorithm used to build M-FFs. The model has to be first defined, then the Gaussian process must be trained using training configurations and forces (and/or local energies). Once a model has been trained, it can be used to predict forces (and/or energies) on unknonwn atomic configurations. A trained Gaussian process can then be mapped onto a tabulated 3-body potential via the build grid function call. A mapped model can be then saved, loaded and used to run molecular dynamics simulations via the calculator module. These mapped potentials retain the accuracy of the GP used to build them, while speeding up the calculations by a factor of 10^4 in typical scenarios.

Example:

from mff import models
mymodel = models.ThreeBodySingleSpecies(atomic_number, cutoff_radius, sigma, theta, noise)
mymodel.fit(training_confs, training_forces)
forces = mymodel.predict(test_configurations)
mymodel.build_grid(grid_start, num_3b)
mymodel.save("thismodel.json")
mymodel = models.CombinedSingleSpecies.from_json("thismodel.json")

Combined Model

Module that uses 2- and 3-body kernels to do Guassian process regression, and to build 2- and 3-body mapped potentials. The model has to be first defined, then the Gaussian processes must be trained using training configurations and forces (and/or energies). Once a model has been trained, it can be used to predict forces (and/or energies) on unknonwn atomic configurations. A trained Gaussian process can then be mapped onto a tabulated 2-body potential and a tabultaed 3-body potential via the build grid function call. A mapped model can be thensaved, loaded and used to run molecular dynamics simulations via the calculator module. These mapped potentials retain the accuracy of the GP used to build them, while speeding up the calculations by a factor of 10^4 in typical scenarios.

Example:

from mff import models
mymodel = models.CombinedSingleSpecies(atomic_number, cutoff_radius,
                       sigma_2b, sigma_3b, sigma_2b, theta_3b, noise)
mymodel.fit(training_confs, training_forces)
forces = mymodel.predict(test_configurations)
mymodel.build_grid(grid_start, num_2b)
mymodel.save("thismodel.json")
mymodel = models.CombinedSingleSpecies.from_json("thismodel.json")