Advanced Sampling¶
This module contains functions that can be used in order to subsample from very large datasets.
Theory/Introduction¶
Example¶
Assuming we already extracted all of the configurations, forces (and possibly local energies) from a .xyz file, we can apply one of the methods contained in advanced_sampling in order to subsample a meaningful and representative training set.
We first load the configurations and forces previously extracted from the .xyz file:
confs = np.load(configurations_file)
forces = np.load(configurations_file)
We then initialize the sampling class and separate ntest configurations for the test set:
s = Sampling(confs=confs,forces=forces, sigma_2b = 0.05, sigma_3b = 0.1, sigma_mb = 0.2, noise = 0.001, r_cut = 8.5, theta = 0.5)
s.train_test_split(confs=confs, forces = forces, ntest = 200)
Now we can subsample a training set using our preferred method, for example importance vector machine sampling on the variance of force predicion:
MAE, STD, RMSE, index, time = s.ivm_f(method = '2b', ntrain = ntr, batchsize = 1000)
or importance vector machine sampling on the measured error of force predicion for a 3-body kernel:
MAE, STD, RMSE, index, time = s.ivm_f(method = '3b', ntrain = ntr, batchsize = 1000, use_pred_error = False)
Other methods include a sampling based on the interatomic distance values present in every configuration:
MAE, STD, RMSE, index, time = s.grid(method = '2b', nbins = 1000)
Or a sampling based on the interatomic distance values present in every configuration:
MAE, STD, RMSE, index, time = s.grid(method = '2b', nbins = 1000)