MachineLearnedForceFieldTrainer¶
Included in QATK.MLFF
- class MachineLearnedForceFieldTrainer(fitting_parameters, training_sets=None, calculator=None, train_test_split=None, random_seed=None, save_model_evaluator=None)¶
Class for training a machine learned force field.
- Parameters:
fitting_parameters (
BaseMLFFFittingParameters) – The parameters for the training.training_sets (
TrainingSet|Table| sequence of [TrainingSet] | None) – The list of training sets to use for training. Default:Nonecalculator (Calculator |
None) – The calculator to use for calculating the isolated atom energies if applicable for the model. If None, the calculator of the training set is used. Default:Nonetrain_test_split (float) – The fraction of the training set to use for training. The rest is used for testing. Must be a float between 0 and 1. If set to 1, the entire training set is used for training. Default:
0.9random_seed (int) – The random seed used for splitting the data into training and testing data. Default: Generated automatically.
save_model_evaluator (bool) – Whether to save the model evaluator after training. If set to True, a model evaluator object generated after training is saved to a file corresponding to the file/experiment name supplied in the fitting parameters object. If set to False, the model evaluator is not saved. The latter is useful for iteratively adding evaluator objects to a MachineLearnedModelCollection object. Default:
False
- fittedCalculator()¶
The calculator with the fitted potential after the training is completed.
- Returns:
The fitted calculator.
- Return type:
Calculator
- isolatedAtomEnergies()¶
Get the isolated atom energies after training.
- Returns:
The isolated atom energies.
- Return type:
dict
- modelEvaluator()¶
Get the evaluator for the trained model.
- Returns:
The evaluator for the trained model.
- Return type:
- train()¶
Train the machine learned force field.
- trainAndTestData()¶
Get the training and testing data after the training has been performed.
- Returns:
A tuple containing the training and testing data.
- Return type:
tuple of (
TrainingSet,TrainingSet)
Usage Examples¶
In order to train a Machine-learned Force Field model using the MachineLearnedForceFieldTrainer,
the general approach
is to set up a fitting parameters object for the model type to train, to load in the training data,
and to configure an appropriate calculator for calculating isolated atom energies if
required. Additionally, non-default train_test_split and random_seed values for controlling
the way the input data is split can be created. These objects and values are passed to the
MachineLearnedForceFieldTrainer class which will start the training process by simply calling the
train() method.
This example shows how a MACE model is trained using the MachineLearnedForceFieldTrainer class.
# TrainingSet with precomputed energy, force (and stress if present and desired) data. This can be
# one or more TrainingSets
training_set = nlread(
'training_data.hdf5', TrainingSet
)[0]
# Either fetch the calculator from the training set or if not present,
# set up analogously as to used in the training set generator to
# calculate isolated atom energies during training.
calculator = LCAOCalculator()
# Setup model specific fitting parameters object for the training
training_parameters = TrainingParameters(
experiment_name='mace_experiment',
# Other training parameters can be set as desired ...
)
model_parameters = MACEModelParameters(
max_l_equivariance=0,
# Other model parameters can be set as desired ...
)
dataset_parameters = ForceFieldDatasetParameters(
validation_fraction=0.2,
# Other dataset parameters can be set as desired ...
)
fitting_parameters = MACEFittingParameters(
model_parameters=model_parameters,
dataset_parameters=dataset_parameters,
training_parameters=training_parameters,
)
# Setup ML model training object
machine_learned_force_field_trainer = MachineLearnedForceFieldTrainer(
fitting_parameters=fitting_parameters,
training_sets=training_set,
calculator=calculator,
# Optional parameters can be set as desired
train_test_split=0.8,
random_seed=1234,
)
# Run the training
machine_learned_force_field_trainer.train()
# After training, the trained model evaluator can be retrieved for validation/analysis
model_evaluator = machine_learned_force_field_trainer.modelEvaluator()
nlprint(model_evaluator)
This example shows how an MTP model is trained using the MachineLearnedForceFieldTrainer class.
# TrainingSet with precomputed energy, force (and stress if present and desired) data. This can be
# one or more TrainingSets
training_set = nlread(
'training_data.hdf5', TrainingSet
)[0]
# Either fetch the calculator from the training set or if not present,
# set up analogously as to used in the training set generator to
# calculate isolated atom energies during training.
calculator = LCAOCalculator()
# Setup model specific fitting parameters object for the training
# Set up non-linear coefficients with optimization.
non_linear_coefficients_parameters = NonLinearCoefficientsParameters(
perform_optimization=False,
initial_coefficients=Random,
random_seed=1234,
)
# Set up parameters to use in the MTP fitting.
fitting_parameters = MomentTensorPotentialFittingParameters(
basis_size=PredefinedBasisSmall,
outer_cutoff_radii=4.0*Angstrom,
mtp_filename='mtp_model.mtp',
non_linear_coefficients_parameters=non_linear_coefficients_parameters,
)
# Setup ML model training object
machine_learned_force_field_trainer = MachineLearnedForceFieldTrainer(
fitting_parameters=fitting_parameters,
training_sets=training_set,
calculator=calculator,
# Optional parameters can be set as desired
train_test_split=0.8,
random_seed=1234,
)
# Run the training
machine_learned_force_field_trainer.train()
# After training, the trained model evaluator can be retrieved for validation/analysis
model_evaluator = machine_learned_force_field_trainer.modelEvaluator()
nlprint(model_evaluator)
Notes¶
The MachineLearnedForceFieldTrainer class is used to train machine-learned force fields (MLFFs) in QuantumATK. The intention of it is to offer a model agnostic trainer object that can train various kinds of models, where the type of the
fitting_parameterobject will determine how the training is conducted. It can train MLFF models that have FittingParameters objects that fulfill certain setup requirements. This currently includes the MomentTensorPotentialFittingParameters and MACEFittingParameters based MLFFs.The MachineLearnedForceFieldTrainer class is designed to be used with only the TrainingSet class for training data input. The ForceFieldTrainingSetGenerator class is designed to be able to generate and combine training data that can be directly passed to the MachineLearnedForceFieldTrainer via the
generatedTrainingSet()method. Training data in other formats has to be converted into that format either by using the conversion/”export as” utility in the Data Tool in Nanolab or by directly converting/wrapping other data storage types into TrainingSet objects.After setting up a training with the MachineLearnedForceFieldTrainer class, it is run by calling the
train()method. This will execute the training process according to the specified fitting parameters and training data. After training is complete, the trained MLFF model is saved to file according to specifications in the used fitting parameters. As an automatic post-training step, a validation of the trained model is performed on the test (and the train) set with the MachineLearnedModelEvaluator class. The resulting object is retrieved from the MachineLearnedForceFieldTrainer with themodelEvaluator()method. The results of this validation are available for visual inspection via theMLFFAnalyzerin Nanolab or by calling nlprint on the MachineLearnedModelEvaluator object.For the parallelization of the MachineLearnedForceFieldTrainer when training MTPs (MomentTensorPotentialFittingParameters) it is recommended to use many MPI processes, as the MTP fitting benefit more from MPI parallelization than from threading. When using MACEFittingParameters for training MACE models, it is currently recommended to use only a single MPI process.