MachineLearnedPropertyTrainer¶
Included in QATK.MLFF
- class MachineLearnedPropertyTrainer(fitting_parameters, training_sets=None, train_test_split=None, random_seed=None)¶
Initialize the MachineLearnedPropertyTrainer.
- Parameters:
fitting_parameters (
GeneralPropertyFittingParameters) – The parameters for the training.training_sets (
TrainingSet|Table| sequence of [TrainingSet] | None) – The list of training sets to use for training. Default:Nonetrain_test_split (float) – The fraction of the training set to use for training. The rest is used for testing. Must be a float between 0 and 1. If set to 1, the entire training set is used for training. Default:
0.9random_seed (int) – The random seed used for splitting the data into training and testing data. Default: Generated automatically.
- propertyPredictor()¶
Return the trained property predictor.
- Returns:
The trained property predictor
- Return type:
- train()¶
Train the machine learned force field.
- trainAndTestData()¶
Get the training and testing data after the training has been performed.
- Returns:
A tuple containing the training and testing data.
- Return type:
tuple of (
TrainingSet,TrainingSet)
Usage Examples¶
Train a property prediction model using a TrainingSet with additional properties:
# Load a TrainingSet with bandgap property data saved under the additional property key 'bandgap'
training_set = nlread("path/to/property_dataset.hdf5", TrainingSet)[0]
# Define model parameters
# Fine-tune the property model from a pre-trained force field MACE foundation model
model_params = MACEModelParameters(
foundation_model_path="path/to/mace-mp-0b3-medium.model",
)
# Define training parameters
training_params = TrainingParameters(
experiment_name="bandgap_model",
random_seed=42,
)
# Define dataset parameters for band gap property
# Specify that we are training a configuration-level property (one scalar value per configuration)
dataset_params = GeneralPropertyDatasetParameters(
property_key="bandgap",
task_type=MLParameterOptions.TASK_TYPE.GENERAL, # Configuration-level property
validation_fraction=0.1,
)
# Define the fitting parameters
fitting_params = GeneralPropertyFittingParameters(
model_parameters=model_params,
dataset_parameters=dataset_params,
training_parameters=training_params,
)
# Create the MachineLearnedPropertyTrainer
trainer = MachineLearnedPropertyTrainer(
fitting_parameters=fitting_params,
training_sets=training_set,
train_test_split=0.8,
)
# Train the model
trainer.train()
# Get the trained PropertyPredictor from the trainer
predictor = trainer.propertyPredictor()
# Alternatively, load a PropertyPredictor directly from the saved model file
# The trained model is automatically saved as 'bandgap_model_<random_seed>.qatkpt'
predictor = PropertyPredictor("bandgap_model_42.qatkpt")
# Use the predictor to predict properties for new configurations
new_configuration = ... # Load or define your atomic configuration here
predicted_property = predictor.predict(new_configuration)
print("Predicted bandgap:", predicted_property)
The training data must include the property you want to predict as an additional property in the TrainingSet. See the TrainingSet documentation for details on how to create training sets with additional properties.
After training, the model is saved as a .qatkpt file that can be loaded with
PropertyPredictor for making predictions on new configurations.
Notes¶
The MachineLearnedPropertyTrainer is used to train machine learning models for
predicting properties from atomic configurations. Unlike force field training which is limited to
learning the energy and energy-derivative properties (i.e. forces and stresses) of configurations,
property prediction focuses on learning any configuration-level or atom-wise properties such
as band gaps, formation energies, or atomic charges.
Training Data Requirements
Training data must be provided as TrainingSet objects containing:
Atomic configurations (
BulkConfigurationorMoleculeConfiguration)Additional properties stored with the property key specified in
GeneralPropertyDatasetParameters
See the TrainingSet documentation section on Additional Properties for details on how to create training sets with property data.
Property Types
The trainer supports two types of properties:
Configuration-level properties: Single scalar values per configuration (e.g., band gap, total energy) specified with
MLParameterOptions.TASK_TYPE.GENERALAtom-wise properties: One value per atom (e.g., atomic charges, magnetic moments) specified with
MLParameterOptions.TASK_TYPE.ATOM_WISE
Model Output
After training, the model is automatically saved with a filename following the pattern
<experiment_name>_<random_seed>.qatkpt. The trained model can be loaded using
PropertyPredictor for making predictions. See the PropertyPredictor documentation
for details on how to use the predictor.
The trainer also provides:
propertyPredictor(): Returns the trainedPropertyPredictorinstancetrainAndTestData(): Returns the training and test sets used during training
Fine-Tuning from MACE Foundation Models
Property prediction models can be fine-tuned from pre-trained foundation models by specifying
foundation_model_path in MACEModelParameters. This approach is generally recommended
as the foundation model has already learned useful representations of atomic environments, which can
be reused for the property prediction task. Fine-tuning typically improves accuracy and reduces
the number of training epochs needed compared to training from scratch.
Pre-trained MACE foundation models can be downloaded from: https://github.com/ACEsuit/mace-foundations/tree/main