MomentTensorPotentialTraining¶

class MomentTensorPotentialTraining(filename, object_id, training_sets, calculator=None, correction_calculator=None, calculate_stress=None, train_on_energy_only=None, ignore_non_converged_configurations=None, fitting_parameters_list=None, train_test_split=None, random_seed=None, scaled_spins=None, number_of_processes_per_task=None, log_filename_prefix=None, configurations_per_task=None, committee_size=None, committee_mtp_filename_prefix=None)¶

Study for training and testing a moment tensor potential.

Parameters:

filename (str) – The full or relative filename path the Study object should be saved to. See nlsave().
object_id (str) – The name of the study that the Study object should be saved to within the file. This needs to be a unique name in this file. See nlsave().
training_sets (RandomDisplacementsParameters | CrystalInterfaceTrainingParameters | AlloyTrainingParameters | MolecularDynamicsSnapshotsParameters | TrainingSet | sequence of [RandomDisplacementsParameters | CrystalInterfaceTrainingParameters | AlloyTrainingParameters | MolecularDynamicsSnapshotsParameters] | TrainingSet | Table) – The list of training sets to generate as part of the complete dataset.
calculator (Calculator) – The calculator to fit the moment tensor potential to.
Default: LCAOCalculator().
correction_calculator (Calculator) – The calculator to correct energy force and stress data with. The MTP is fit to the reference calculator minus the contribution of the correction calculator.
Default: None.
calculate_stress (bool) – Whether the stress will be calculated and added to the output. Only applies to bulk configurations.
Default: True.
train_on_energy_only (bool) – Flag to state whether the MTP should be trained solely on energy or not.
Default: False.
ignore_non_converged_configurations (bool) – Flag to state whether or not non-converged configurations are ignored in fitting.
Default: True.
fitting_parameters_list (MomentTensorPotentialFittingParameters | sequence of MomentTensorPotentialFittingParameters | Table) – The list of parameter sets for fitting the moment tensor potential. A separate moment tensor potential will be fitted for each parameter set in the list.
Default: No fitting is performed.
train_test_split (float) – The proportion of the complete dataset to use for training; the rest will be used for testing. Must be in the range (0, 1].
Default: 0.8.
random_seed (int) – The random seed used for generating the train/test split.
Default: Generated automatically.
scaled_spins (A list of tuples each of a PeriodicTableElement object and a scaled spin value. For non-collinear spin systems the tuple has four numbers, atom index, scaled spin, theta, phi, where the latter two are spherical coordinates as PhysicalQuantity object of type Degree or Radians) – The initial scaled spins for each type of element.
Default: None.
number_of_processes_per_task (int | None | ProcessesPerNode) – The number of processes that will be used to execute each task. If the total number of process does not divide evenly into the tasks, some tasks may have less than this number of processes. If None, all available processes execute each task collaboratively.
Default: None.
log_filename_prefix (str | LogToStdOut) – General filename prefix for the logging output of the study tasks. If LogToStdOut, all logging will instead be sent to standard output.
Default: 'mtp_training_'.
configurations_per_task (int) – The number of configurations packed into each bundle task. For most applications, the default value is recommended to ensure a good balance between the performances of parallelization, restarting, and disc reading during task execution. This parameter is ignored for training sets of type CrystalInterface.
Default: Determined automatically with a linear bundle scaling based on the average number of atoms per configuration.
committee_size (int | None) – The number of committee models to train and store for query-by-committee MTP MD uncertainty predictions. If None, no committee model will be trained and added to the main MTP model.
Default: None
committee_mtp_filename_prefix (str | None) – Sets the prefix for the name of the mtp file containing the combined MTP parameters for the trained base MTP and the committee MTPs. If no prefix is set, the combined parameters are saved to the mtp_filename set in MomentTensorPotentialFittingParameters.
Default: None

availableDataTags()¶

Returns:: The available tags that have been set to select different datasets.
Return type:: list

bestMTPParameters()¶

Ranks the fits and returns the best one.

Returns:: The best fit
Return type:: MTPParameters or None

calculateStress()¶

Returns:: Whether the stress will be calculated and added to the output. Only applies to bulk configurations.
Return type:: bool

calculator()¶

Returns:: The calculator to fit the moment tensor potential to.
Return type:: Calculator

committeeMTPFilenamePrefix()¶

Returns:: The prefix set on the mtp filename that the base mtp and committee mtp models are saved to.
Return type:: str | None

committeeSize()¶

Returns:: The number of committee models specified by the user, or None if assigned automatically.
Return type:: int | None

configurationsPerTask()¶

Returns:: The number of configurations per task specified by the user, or None if assigned automatically.
Return type:: int | None

continueTrainingSets(log_filename_prefix=None, recalculate_training_data=None)¶

Return a list with 2 items: The first is a training set that includes all the converged configurations, the second is a training set with all the non-converged configurations.

Parameters:

log_filename_prefix (str) – Prefix for the log file for completed and ignored data.
recalculate_training_data (bool) – Whether or not the energy, forces and stresses are recalculated for the converged configurations. Note: The TrainingSet containing the non-converged configurations is always recalculated.
Default: False

Returns:

List of training sets containing available converged and non-converged configurations.

Return type:

list

correctionCalculator()¶

Returns:: The calculator to correct energy force and stress data with.
Return type:: Calculator

dependentStudies()¶

Returns:: The list of dependent studies.
Return type:: list of Study

filename()¶

Returns:: The filename where the study object is stored.
Return type:: str

fittingParametersList()¶

Returns:: The list of parameter sets for fitting the moment tensor potential, or None if no fitting is performed.
Return type:: list of MomentTensorPotentialFittingParameters | None

fittingReport(stream, data_tags=None, sort_best_fit=False)¶

Print a string containing an ASCII table summarizing the available results for the study.

Parameters:

stream (file-like) – The stream to write to. This should be an object that supports strings being written to it using a write method.
data_tags (str | list | None) – List of tags indicating the datasets to calculate the statistics for.
sort_best_fit (bool) – Print the fits sorted by quality.
Default: False

ignoreNonConvergedConfigurations()¶

Returns:: Whether or not configurations are ignored if the calculation of energy does not converge. Only applies to calculators that use an SCF method.
Return type:: bool

ignoredConfigurations()¶

Return the configurations that are ignored because the calculation of energy, force and stress did not converge.

Returns:: The training configurations that did not converge.
Return type:: ConfigurationDataContainer | None

logFilenamePrefix()¶

Returns:: The filename prefix for the logging output of the study.
Return type:: str | LogToStdOut

momentTensorPotentialCalculators(fit_index=None, include_correction=False)¶

The output moment tensor potential calculators, generated with the given fitting parameter sets. This result will only be available after the study has been updated.

Parameters:

fit_index (int) – The index of the fitting parameter set.
include_correction (bool) – Whether or not the returned MTP includes the correction calculator.
Default: False.

Returns:

The fitted moment tensor potential calculators. If not available, returns None.

Return type:

TremoloXCalculator | list of TremoloXCalculator | None

nlprint(stream=None)¶

Print a string containing an ASCII table useful for plotting the Study object.

Parameters:: stream (python stream) – The stream the table should be written to.
Default: NLPrintLogger()

nonLinearCoefficients()¶

The output non-linear coefficients for the moment tensor potential fitting, generated with the given fitting parameter sets. If the optimization of the coefficients is switched off, they will be equivalent to the input coefficients. This result will only be available after the study has been updated.

Returns:: The optimized non-linear coefficients. If not available, returns None.
Return type:: dict | None

numberOfProcessesPerTask()¶

Returns:: The number of processes to be used to execute each task. If None, all available processes execute each task collaboratively.
Return type:: int | None | ProcessesPerNode

numberOfProcessesPerTaskResolved()¶

Returns:: The number of processes to be used to execute each task. Default values are resolved based on the current execution settings.
Return type:: int

objectId()¶

Returns:: The name of the study object in the file.
Return type:: str

randomSeed()¶

Returns:: The random seed used for generating the train/test split, or None if there is no split.
Return type:: int | None

rankFits(data_tags=None, weights=None, statistical_measure=None)¶

Ranks the fits based on some (to be specified) quality measure.

Parameters:

data_tags (str | list | None) – List of tags indicating the datasets to calculate the statistics for.
weights (list(2) of lists(3) of float) – Weights to be used for the energy, forces, stress and training, testing errors. The weights are always normalized. [training[energy, forces, stress], testing[energy, forces, stress]].
Default: [[1, 1, 1], [1, 1, 1]].
statistical_measure (RMSE | MAE | R2Score) – The statistical measure used as quality metric. RMSE, MAE, R2Score.
Default: R2Score.

Returns:

Two lists: 1. An ordered list of fit indices ranked by the quality measure (best first). 2. A list of the quality measures in the original fit order.

Return type:

list(2) of lists

saveToFileAfterUpdate()¶

Returns:: Whether the study is automatically saved after it is updated.
Return type:: bool

scaledSpins()¶

Returns:: The scaled spins of the atoms.
Return type:: list.

statisticsData(fit_indices=None, data_tags=None, plot_cohesive_energy=False)¶

Gets the training statistics data from the placeholder task, or generates it if it is not available.

Returns:: The training statistics data.

trainOnEnergyOnly()¶

Returns:: Whether the training should be based solely on energy (True) or on energy, forces and stress (False).
Return type:: bool

trainTestSplit()¶

Returns:: The proportion of the complete dataset to use for training; the rest will be used for testing.
Return type:: float

trainingSets()¶

Returns:: The list of training sets to use or generate based on the reference configurations.
Return type:: list of [RandomDisplacementsParameters | CrystalInterfaceTrainingParameters | AlloyTrainingParameters | MolecularConfigurationsParameters MolecularDynamicsSnapshotsParameters | TrainingSet]

trainingTestingDatasets(train_test_split=None, data_tags=None, efs_only=False)¶

The complete output training and testing datasets with the calculated E/F/S data. This result will only be available after the study has been updated.

Parameters:

train_test_split (float) – The proportion of the complete dataset to use for training; the rest will be used for testing. Must be in the range (0, 1].
Default: The value given, when constructing the object.
data_tags (str | list | None) – One or more tags used to identify which configurations is returned.
Default: None, which returns all available configurations.
efs_only (bool) – Flag to disable reading of validation and correction data. This significantly speeds up construction of the trajectory.

Returns:

The complete training and testing datasets. If not available, returns None.

Return type:

tuple (size 2) of Trajectory | None

uniqueString()¶: Return a unique string representing the state of the object.

update()¶: Run the calculations for the study object and ensure that the MTP parameters are written to the given filenames.

writeMTPFiles(fit_index=None, mtp_filename=None)¶

Write the MTP parameters to file, Works only if the mtp file content is available from the fitting task.

Parameters:

fit_index (int | None) – Write only the content for this fit index. If None, write the files for all fits.
mtp_filename (str | None) – Write the content to this filename. If None, the filename specified in the corresponding fitting parameters is used.

Usage Examples¶

Setup of MomentTensorPotentialTraining of quartz.

# Set up lattice
lattice = Hexagonal(4.916*Angstrom, 5.4054*Angstrom)

# Define elements
elements = [Silicon, Silicon, Silicon, Oxygen, Oxygen, Oxygen, Oxygen, Oxygen,
           Oxygen]

# Define coordinates
fractional_coordinates = [[ 0.4697        ,  0.            ,  0.            ],
                         [ 0.            ,  0.4697        ,  0.666666666667],
                         [ 0.5303        ,  0.5303        ,  0.333333333333],
                         [ 0.4135        ,  0.2669        ,  0.1191        ],
                         [ 0.2669        ,  0.4135        ,  0.547567      ],
                         [ 0.7331        ,  0.1466        ,  0.785767      ],
                         [ 0.5865        ,  0.8534        ,  0.214233      ],
                         [ 0.8534        ,  0.5865        ,  0.452433      ],
                         [ 0.1466        ,  0.7331        ,  0.8809        ]]

# Set up configuration
reference_configuration = BulkConfiguration(
    bravais_lattice=lattice,
    elements=elements,
    fractional_coordinates=fractional_coordinates
)

# Define calculator for E/F/S data calculations.
calculator = LCAOCalculator()

# Set up non-linear coefficients with optimization.
non_linear_coefficients_parameters = NonLinearCoefficientsParameters(
    perform_optimization=True,
)

# Set up parameters to use in the MTP fitting.
fitting_parameters = MomentTensorPotentialFittingParameters(
    basis_size=1000,
    outer_cutoff_radii=3.0*Angstrom,
    mtp_filename='mtp_study.mtp',
    non_linear_coefficients_parameters=non_linear_coefficients_parameters,
)

# In this specific example, the default displacement protocol for crystals is used
training_sets = crystalTrainingRandomDisplacements(
    reference_configuration,
    supercell_repetitions_list=[(2, 2, 2), (3, 3, 3)],
    sample_size_per_stage=10,
)

# Set up MTP training.
moment_tensor_potential_training = MomentTensorPotentialTraining(
    filename='mtp_study',
    object_id='training',
    training_sets=training_sets,
    calculator=calculator,
    calculate_stress=True,
    fitting_parameters_list=fitting_parameters,
)
moment_tensor_potential_training.update()
nlprint(moment_tensor_potential_training)

# The MTP calculator can now be extracted from the MomentTensorPotentialTraining object.
mtp_calculator = moment_tensor_potential_training.momentTensorPotentialCalculators()[0]

MomentTensorPotentialTraining_example1.py

Several MomentTensorPotentialTraining objects with pre-calculated training data can be loaded and then passed into another MomentTensorPotentialTraining object which does the fitting:

training_sets = []

# Load another MomentTensorPotentialTraining object with precalculated data.
mtp_training_data_input = nlread('mtp_training_data.hdf5', MomentTensorPotentialTraining)[0]
training_sets.append(
    TrainingSet(mtp_training_data_input, recalculate_training_data=False)
)

# Load another Trajectory object with precalculated data.
trajectory_training_data_input = nlread('trajectory_training_data.hdf5', Trajectory)[0]
training_sets.append(
    TrainingSet(trajectory_training_data_input, recalculate_training_data=False)
)

# Set up MTP training.
moment_tensor_potential_training = MomentTensorPotentialTraining(
    filename='mtp_training.hdf5',
    object_id='training',
    training_sets=training_sets,
    calculator=LCAOCalculator(),
    calculate_stress=True,
    fitting_parameters_list=fitting_parameters,
)
moment_tensor_potential_training.update()
nlprint(moment_tensor_potential_training)

MomentTensorPotentialTraining_example2.py

Notes¶

Note

Study objects behave differently from analysis objects. See the Study object overview for more details.

This class implements the moment tensor potential (MTP) framework [1]. This class combines 3 different stages of the MTP training workflow:

The generation and calculation of training data
The actual training of the MTP parameters
The validation of the trained MTP

To generate and calculate training data different possibilities are provided:

RandomDisplacementsParameters or crystalTrainingRandomDisplacements: A series of random atomic displacements and strain to sample to phase space around an equilibrium configuration.
MolecularDynamicsSnapshotsParameters: Snapshots from molecular dynamics simulations using either the final reference calculator, or a force field or fast high-throughput DFT calculator. In the latter case energies, forces, and stress are recalculated using the reference calculator for a subset of the snapshots from the MD trajectory.
TrainingSet: A set of available training configurations with or without pre-calculated reference energy, forces, and stress. If no reference data is provided, it will automatically be recalculated using the given reference calculator.
CrystalInterfaceTrainingParameters: Used to generate training configurations by building interfaces from two crystalline bulk materials.
MolecularConfigurationsParameters: Class for generating a set of training configurations using a combination of sampling different torsion angles and atomic displacements.
AlloyTrainingParameters: Class for generating a set of alloy training configurations.

The calculation of reference data in the MomentTensorPotentialTraining can be parallelized efficiently over different process groups via the keyword number_of_processes_per_task.

A MomentTensorPotentialTraining with calculated reference data can also be used as input to other MomentTensorPotentialTraining objects by passing it via a TrainingSet object. This can be used to separate the calculation of training data and fitting steps. It also allows easily combining many different pre-calculated training data sets into a single fit. This can become more practical for complex materials where the training data contains many different separate components.

The second (optional) stage is the training of the MTP parameters. Here, the MomentTensorPotentialFittingParameters object can be passed to specify the MTP hyper-parameters that should be used for the fit. To optimize these parameters a list of these objects with different parameters can be given. A fit will be carried out for each of these to identify the best combination of parameters from the training and test errors reported via nlprint().

The nlprint() report prints a summary of all training energy, forces, and stress root-mean-square errors with respect to the reference data for all fitting parameters. An independent test error calculated for configurations which have been taken out of the training data is also reported. The ratio of configurations used for training respectively testing is specified by the train_test_split parameters. The two sets are split randomly from the entire training data.

The training and test errors both give indications of the quality of the potential and possible ways to improve the fit. If the training and test errors are both similarly high, this is an indication that the model is not complex enough to properly represent the training set. In this case the potential may be improved by increasing the basis set size. If the test error is significantly larger than the training error, this is an indication that the training set is too small, and that the potential is being over-fitted. In this case the potential may be improved by generating more training data, either by specifying new training sets or by performing active learning.

The list with the final calculators with the trained MTP parameters, one calculator for each fit, can be queried via momentTensorPotentialCalculators(). Alternatively, the MTP calculator may be imported as an .mtp file using the MTPPotential class.

For the parallelization of the MomentTensorPotentialTraining it is recommended to use many MPI processes, as both the DFT calculations and the MTP fitting benefit more from MPI parallelization than from threading.