MachineLearnedPropertyTrainer¶

Included in QATK.MLFF

class MachineLearnedPropertyTrainer(fitting_parameters, training_sets=None, train_test_split=None, random_seed=None)¶

Initialize the MachineLearnedPropertyTrainer.

Parameters:

fitting_parameters (GeneralPropertyFittingParameters) – The parameters for the training.
training_sets (TrainingSet | Table | sequence of [TrainingSet] | None) – The list of training sets to use for training.
Default: None
train_test_split (float) – The fraction of the training set to use for training. The rest is used for testing. Must be a float between 0 and 1. If set to 1, the entire training set is used for training.
Default: 0.9
random_seed (int) – The random seed used for splitting the data into training and testing data.
Default: Generated automatically.

propertyPredictor()¶

Return the trained property predictor.

Returns:: The trained property predictor
Return type:: PropertyPredictor

train()¶: Train the machine learned force field.

trainAndTestData()¶

Get the training and testing data after the training has been performed.

Returns:: A tuple containing the training and testing data.
Return type:: tuple of (TrainingSet, TrainingSet)

Usage Examples¶

Train a property prediction model using a TrainingSet with additional properties:

# Load a TrainingSet with bandgap property data saved under the additional property key 'bandgap'
training_set = nlread("path/to/property_dataset.hdf5", TrainingSet)[0]

# Define model parameters
# Fine-tune the property model from a pre-trained force field MACE foundation model
model_params = MACEModelParameters(
    foundation_model_path="path/to/mace-mp-0b3-medium.model",
)

# Define training parameters
training_params = TrainingParameters(
    experiment_name="bandgap_model",
    random_seed=42,
)

# Define dataset parameters for band gap property
# Specify that we are training a configuration-level property (one scalar value per configuration)
dataset_params = GeneralPropertyDatasetParameters(
    property_key="bandgap",
    task_type=MLParameterOptions.TASK_TYPE.GENERAL,  # Configuration-level property
    validation_fraction=0.1,
)

# Define the fitting parameters
fitting_params = GeneralPropertyFittingParameters(
    model_parameters=model_params,
    dataset_parameters=dataset_params,
    training_parameters=training_params,
)

# Create the MachineLearnedPropertyTrainer
trainer = MachineLearnedPropertyTrainer(
    fitting_parameters=fitting_params,
    training_sets=training_set,
    train_test_split=0.8,
)

# Train the model
trainer.train()

# Get the trained PropertyPredictor from the trainer
predictor = trainer.propertyPredictor()

# Alternatively, load a PropertyPredictor directly from the saved model file
# The trained model is automatically saved as 'bandgap_model_<random_seed>.qatkpt'
predictor = PropertyPredictor("bandgap_model_42.qatkpt")

# Use the predictor to predict properties for new configurations
new_configuration = ...  # Load or define your atomic configuration here
predicted_property = predictor.predict(new_configuration)
print("Predicted bandgap:", predicted_property)

property_trainer_example.py

The training data must include the property you want to predict as an additional property in the TrainingSet. See the TrainingSet documentation for details on how to create training sets with additional properties.

After training, the model is saved as a .qatkpt file that can be loaded with PropertyPredictor for making predictions on new configurations.

Notes¶

The MachineLearnedPropertyTrainer is used to train machine learning models for predicting properties from atomic configurations. Unlike force field training which is limited to learning the energy and energy-derivative properties (i.e. forces and stresses) of configurations, property prediction focuses on learning any configuration-level or atom-wise properties such as band gaps, formation energies, or atomic charges.

Training Data Requirements

Training data must be provided as TrainingSet objects containing:

Atomic configurations (BulkConfiguration or MoleculeConfiguration)
Additional properties stored with the property key specified in GeneralPropertyDatasetParameters

See the TrainingSet documentation section on Additional Properties for details on how to create training sets with property data.

Property Types

The trainer supports two types of properties:

Configuration-level properties: Single scalar values per configuration (e.g., band gap, total energy) specified with MLParameterOptions.TASK_TYPE.GENERAL
Atom-wise properties: One value per atom (e.g., atomic charges, magnetic moments) specified with MLParameterOptions.TASK_TYPE.ATOM_WISE

Model Output

After training, the model is automatically saved with a filename following the pattern <experiment_name>_<random_seed>.qatkpt. The trained model can be loaded using PropertyPredictor for making predictions. See the PropertyPredictor documentation for details on how to use the predictor.

The trainer also provides:

propertyPredictor(): Returns the trained PropertyPredictor instance
trainAndTestData(): Returns the training and test sets used during training

Fine-Tuning from MACE Foundation Models

Property prediction models can be fine-tuned from pre-trained foundation models by specifying foundation_model_path in MACEModelParameters. This approach is generally recommended as the foundation model has already learned useful representations of atomic environments, which can be reused for the property prediction task. Fine-tuning typically improves accuracy and reduces the number of training epochs needed compared to training from scratch.

Pre-trained MACE foundation models can be downloaded from: https://github.com/ACEsuit/mace-foundations/tree/main