MLParameterOptions

Included in QATK.MLFF

class MLParameterOptions

Framework-agnostic container for ML parameters with choosable options.

class DATASET_TYPE

Dataset type for model evaluation and training.

class DEVICE

Device options for ML training and inference.

class DTYPE

Data type precision options.

class MODEL_CONVERSION

Format for converting MACE models to QATK format.

class POOLING

Pooling methods for aggregation.

class REPLAY_FILTERING

Filtering type for replay configurations based on elements.

class REPLAY_SUBSELECT

Subselection method for replay data sampling.

class TASK_TYPE

Task type for property prediction.

Notes

The MLParameterOptions class provides predefined options for various MLFF training parameters. These options are organized into nested classes, each representing a specific parameter category.

Available Option Groups For MLFF Model Training

DEVICE

Controls which hardware device is used for training:

  • MLParameterOptions.DEVICE.AUTOMATIC - Automatically selects GPU if available, otherwise CPU

  • MLParameterOptions.DEVICE.CPU - Use CPU for training

  • MLParameterOptions.DEVICE.GPU - Use GPU (CUDA) for training

DTYPE

Specifies the floating-point precision for model parameters:

  • MLParameterOptions.DTYPE.FLOAT32 - 32-bit floating point (faster, less memory)

  • MLParameterOptions.DTYPE.FLOAT64 - 64-bit floating point (more accurate)

TASK_TYPE

Defines the type of prediction task:

  • MLParameterOptions.TASK_TYPE.GENERAL - Global, per-configuration property prediction

  • MLParameterOptions.TASK_TYPE.ATOM_WISE - Atom-wise property prediction

POOLING

Specifies the pooling method for aggregating atomic contributions:

  • MLParameterOptions.POOLING.SUM - Sum atomic contributions

  • MLParameterOptions.POOLING.MEAN - Average atomic contributions

  • MLParameterOptions.POOLING.MAX - Take maximum of atomic contributions

DATASET_TYPE

Defines the type of dataset used for model evaluation and training. This allows distinguishing between training data used to fit the model and test data used to evaluate model performance.

  • MLParameterOptions.DATASET_TYPE.TRAINING - Indicates the dataset is used for training the model

  • MLParameterOptions.DATASET_TYPE.TEST - Indicates the dataset is used for testing/evaluation purposes

MODEL_CONVERSION

Defines the format for converting MACE models to QuantumATK-compatible format. Different formats provide different precision and performance characteristics for model deployment.

  • MLParameterOptions.MODEL_CONVERSION.E3NN - Convert using the E3NN (Euclidean neural networks) format. This is the standard format

  • MLParameterOptions.MODEL_CONVERSION.CUE_FP32 - Convert to CuEquivariance format with 32-bit floating-point precision (optimized CUDA kernels)

  • MLParameterOptions.MODEL_CONVERSION.CUE_FP64 - Convert to CuEquivariance format with 64-bit floating-point precision (higher numerical accuracy)

REPLAY_FILTERING

Defines filtering options for replay configurations based on chemical elements when performing replay finetuning of MACE models. Controls which configurations from the foundation model’s original training data are included during finetuning.

  • MLParameterOptions.REPLAY_FILTERING.NONE - No filtering applied

  • MLParameterOptions.REPLAY_FILTERING.COMBINATIONS - Only include configurations with combinations of elements present in new training data (default)

  • MLParameterOptions.REPLAY_FILTERING.EXCLUSIVE - Only include configurations with exclusively the elements in new training data (more restrictive)

  • MLParameterOptions.REPLAY_FILTERING.INCLUSIVE - Include configurations with at least one element from new training data (more permissive)

REPLAY_SUBSELECT

Defines methods for subselecting configurations from the replay dataset when performing replay finetuning of MACE models. Controls how configurations are sampled from the foundation model’s original training data.

  • MLParameterOptions.REPLAY_SUBSELECT.RANDOM - Randomly select configurations (default, fastest)

  • MLParameterOptions.REPLAY_SUBSELECT.FPS - Use Farthest Point Sampling for maximal diversity (slower, requires foundation model)

Usage Examples

Training parameters

training_parameters = TrainingParameters(
    experiment_name='my_experiment',
    default_dtype=MLParameterOptions.DTYPE.FLOAT64,
    device=MLParameterOptions.DEVICE.AUTOMATIC,
)

Model evaluation

best_score, best_index, best_identifier, best_evaluator = model_collection.getBestModel(
    statistical_measure=R2Score,
    dataset_type=MLParameterOptions.DATASET_TYPE.TEST,
)

Model conversion

convertMACEModelToQATKFormat(
    "mace_model.model",
    conversion_format=MLParameterOptions.MODEL_CONVERSION.CUE_FP32,
)

Replay finetuning

replay_settings = MACEReplayFinetuningSettings(
    replay_data_filepath='/path/to/mp_traj_combined.xyz',
    number_of_samples=15000,
    replay_filtering_type=MLParameterOptions.REPLAY_FILTERING.COMBINATIONS,
    replay_subselect_method=MLParameterOptions.REPLAY_SUBSELECT.RANDOM,
)

String equivalents

String equivalents also work for all options:

best_score, best_index, best_identifier, best_evaluator = model_collection.getBestModel(
    statistical_measure=R2Score,
    dataset_type='test',  # or 'training'
)