MultipleMachineLearnedForceFieldTrainers¶
Included in QATK.MLFF
- class MultipleMachineLearnedForceFieldTrainers(fitting_parameters_list, training_sets=None, calculator=None, train_test_split=None, random_seed=None, save_model_evaluator=None, statistical_measure=None, dataset_type=None)¶
Class for training multiple machine learned force fields. A MachineLearnedModelCollection object is created after training containing all the model evaluators.
- Parameters:
fitting_parameters_list (sequence of
BaseMLFFFittingParameters|Table) – The list of fitting parameters for the training.training_sets (
TrainingSet|Table| sequence of [TrainingSet] | None) – The list of training sets to use for training. Default:Nonecalculator (Calculator |
None) – The calculator to use for calculating the isolated atom energies if applicable for the model. If None, the calculator of the training set is used. Default:Nonetrain_test_split (float) – The fraction of the training set to use for training. The rest is used for testing. Must be a float between 0 and 1. If set to 1, the entire training set is used for training. Default:
0.9random_seed (int) – The random seed used for splitting the data into training and testing data. Default: Generated automatically.
save_model_evaluator (bool) – Whether to save the model evaluators after training. If set to True, model evaluator objects generated after training are saved to a file corresponding to the file/experiment name supplied in the fitting parameters object. If set to False, the model evaluators are not saved. The latter is useful for iteratively adding evaluator objects to a MachineLearnedModelCollection object. Default:
Falsestatistical_measure (RMSE | MAE | R2Score) – The statistical measure to use for ranking the models. It is only used in bestFittedCalculator(). Default:
R2Score.dataset_type (str) – Which dataset to use for ranking the models in bestFittedCalculator(). Possible values are MLParameterOptions.DATASET_TYPE.TRAINING and MLParameterOptions.DATASET_TYPE.TEST. Default:
MLParameterOptions.DATASET_TYPE.TEST.
- bestFittedCalculator(weights=None, use_cohesive_energy=None)¶
The calculator with the best performing model after the training is completed.
- Parameters:
weights (list of float | None) – Weights for ranking. See rankModels in MachineLearnedModelCollection for details.
use_cohesive_energy (bool | None) – Whether to use cohesive energy calculations.
- Returns:
The best fitted calculator.
- Return type:
Calculator- Raises:
NLValueError – If no valid models are found.
- isolatedAtomEnergies()¶
Get the isolated atom energies after training.
- Returns:
The isolated atom energies.
- Return type:
dict
- modelCollection()¶
Get the collection of trained models.
- Returns:
The collection of trained models.
- Return type:
- modelCount()¶
Get the number of models to be trained.
- Returns:
The number of models.
- Return type:
int
- modelEvaluators()¶
Get the evaluators for the trained models.
- Returns:
The evaluators for the trained models.
- Return type:
sequence of
MachineLearnedModelEvaluator
- train()¶
Train the machine learned force fields.
Usage Examples¶
This example shows how multiple MTP models are trained using the MultipleMachineLearnedForceFieldTrainers class with a list of fitting parameters generated by scanOverNonLinearCoefficients.
# TrainingSet with precomputed energy, force (and stress if present and desired) data. This can be
# one or more TrainingSets
training_set = nlread("training_data.hdf5", TrainingSet)[0]
# Either fetch the calculator from the training set or if not present,
# set up analogously as to used in the training set generator to
# calculate isolated atom energies during training.
calculator = LCAOCalculator()
# Generate a list of fitting parameters with different initial guesses for non-linear coefficients
# This is particularly useful for finding the best initial non-linear coefficients for MTP models
fitting_parameters_list = scanOverNonLinearCoefficients(
number_of_initial_guesses=30,
basis_size=PredefinedBasisSmall,
outer_cutoff_radii=3.0 * Angstrom,
mtp_filename_suffix="MTP_fit.mtp",
random_seed=42,
perform_optimization=False,
)
# Setup ML model training object for multiple models
multiple_machine_learned_force_field_trainers = (
MultipleMachineLearnedForceFieldTrainers(
fitting_parameters_list=fitting_parameters_list,
training_sets=training_set,
calculator=calculator,
# Optional parameters can be set as desired
train_test_split=0.8,
random_seed=1234,
)
)
# Run the training for all models
multiple_machine_learned_force_field_trainers.train()
# After training, retrieve the model collection for validation/analysis
model_collection = multiple_machine_learned_force_field_trainers.modelCollection()
# Print summary of all trained models
nlprint(model_collection)
# Get the best model information based on R2Score on the test dataset
best_score, best_index, best_identifier, best_evaluator = model_collection.getBestModel(
statistical_measure=R2Score,
dataset_type=MLParameterOptions.DATASET_TYPE.TEST,
)
# The best model calculator can then be retrieved from the best evaluator for further use
best_calculator = best_evaluator.calculator()
Notes¶
In order to train multiple Machine-learned Force Field models using the MultipleMachineLearnedForceFieldTrainers, the general approach is to set up a list or Table of fitting parameters objects for the model types to train, to load in the training data, and to configure an appropriate calculator for calculating isolated atom energies if required. Additionally, non-default
train_test_splitandrandom_seedvalues for controlling the way the input data is split can be provided. These objects and values are passed to the MultipleMachineLearnedForceFieldTrainers class which will start the training process by simply calling thetrain()method.The MultipleMachineLearnedForceFieldTrainers class is analogous to the MachineLearnedForceFieldTrainer class and is used to train multiple machine-learned force fields (MLFFs) in QuantumATK. It uses the MachineLearnedForceFieldTrainer iteratively with a list or Table of input fitting parameter objects. The type of the
fitting_parametersobjects will determine how the training is conducted for each model.The MultipleMachineLearnedForceFieldTrainers class is particularly well suited for use with the fitting parameter output list from scanOverNonLinearCoefficients for training multiple MTP models and finding the fitting parameters object or model with the best initial Non-Linear Coefficients.
The same practices as for the MachineLearnedForceFieldTrainer apply, except that a list or Table of fitting parameters is provided instead of a single object. The TrainingSet class is used for training data input. The ForceFieldTrainingSetGenerator class can generate and combine training data that can be directly passed to the MultipleMachineLearnedForceFieldTrainers via the
.generatedTrainingSet()method.After setting up a training with the MultipleMachineLearnedForceFieldTrainers class, it is run by calling the
train()method. This will execute the training process according to the specified fitting parameters and training data for all fitting parameter objects in the list. After training is complete, the trained MLFF models are saved to file according to specifications in the used fitting parameters.For evaluation, a MachineLearnedModelCollection object is used instead of a single MachineLearnedModelEvaluator. This collection is retrieved from the MultipleMachineLearnedForceFieldTrainers object using the
modelCollection()method.The MachineLearnedModelCollection can be visualized in Nanolab via the
MLFFAnalyzer. Alternatively, callingnlprint()on the collection object will show a summary of all trained models. ThebestFittedCalculator()method can be used to retrieve the calculator corresponding to the best performing model based on the specified statistical measure (R2Score by default).Individual model evaluators can be accessed from the collection using the
modelEvaluators()method, which returns a list of all MachineLearnedModelEvaluator objects for detailed inspection of each trained model.