MLParameterOptions¶

Included in QATK.MLFF

class MLParameterOptions¶

Framework-agnostic container for ML parameters with choosable options.

class DATASET_TYPE¶: Dataset type for model evaluation and training.

class DEVICE¶: Device options for ML training and inference.

class DTYPE¶: Data type precision options.

class MODEL_CONVERSION¶: Format for converting MACE models to QATK format.

class POOLING¶: Pooling methods for aggregation.

class REPLAY_FILTERING¶: Filtering type for replay configurations based on elements.

class REPLAY_SUBSELECT¶: Subselection method for replay data sampling.

class TASK_TYPE¶: Task type for property prediction.

Notes¶

The MLParameterOptions class provides predefined options for various MLFF training parameters. These options are organized into nested classes, each representing a specific parameter category.

Available Option Groups For MLFF Model Training¶

DEVICE

Controls which hardware device is used for training:

MLParameterOptions.DEVICE.AUTOMATIC - Automatically selects GPU if available, otherwise CPU
MLParameterOptions.DEVICE.CPU - Use CPU for training
MLParameterOptions.DEVICE.GPU - Use GPU (CUDA) for training

DTYPE

Specifies the floating-point precision for model parameters:

MLParameterOptions.DTYPE.FLOAT32 - 32-bit floating point (faster, less memory)
MLParameterOptions.DTYPE.FLOAT64 - 64-bit floating point (more accurate)

TASK_TYPE

Defines the type of prediction task:

MLParameterOptions.TASK_TYPE.GENERAL - Global, per-configuration property prediction
MLParameterOptions.TASK_TYPE.ATOM_WISE - Atom-wise property prediction

POOLING

Specifies the pooling method for aggregating atomic contributions:

MLParameterOptions.POOLING.SUM - Sum atomic contributions
MLParameterOptions.POOLING.MEAN - Average atomic contributions
MLParameterOptions.POOLING.MAX - Take maximum of atomic contributions

DATASET_TYPE

Defines the type of dataset used for model evaluation and training. This allows distinguishing between training data used to fit the model and test data used to evaluate model performance.

MLParameterOptions.DATASET_TYPE.TRAINING - Indicates the dataset is used for training the model
MLParameterOptions.DATASET_TYPE.TEST - Indicates the dataset is used for testing/evaluation purposes

MODEL_CONVERSION

Defines the format for converting MACE models to QuantumATK-compatible format. Different formats provide different precision and performance characteristics for model deployment.

MLParameterOptions.MODEL_CONVERSION.E3NN - Convert using the E3NN (Euclidean neural networks) format. This is the standard format
MLParameterOptions.MODEL_CONVERSION.CUE_FP32 - Convert to CuEquivariance format with 32-bit floating-point precision (optimized CUDA kernels)
MLParameterOptions.MODEL_CONVERSION.CUE_FP64 - Convert to CuEquivariance format with 64-bit floating-point precision (higher numerical accuracy)

REPLAY_FILTERING

Defines filtering options for replay configurations based on chemical elements when performing replay finetuning of MACE models. Controls which configurations from the foundation model’s original training data are included during finetuning.

MLParameterOptions.REPLAY_FILTERING.NONE - No filtering applied
MLParameterOptions.REPLAY_FILTERING.COMBINATIONS - Only include configurations with combinations of elements present in new training data (default)
MLParameterOptions.REPLAY_FILTERING.EXCLUSIVE - Only include configurations with exclusively the elements in new training data (more restrictive)
MLParameterOptions.REPLAY_FILTERING.INCLUSIVE - Include configurations with at least one element from new training data (more permissive)

REPLAY_SUBSELECT

Defines methods for subselecting configurations from the replay dataset when performing replay finetuning of MACE models. Controls how configurations are sampled from the foundation model’s original training data.

MLParameterOptions.REPLAY_SUBSELECT.RANDOM - Randomly select configurations (default, fastest)
MLParameterOptions.REPLAY_SUBSELECT.FPS - Use Farthest Point Sampling for maximal diversity (slower, requires foundation model)

Usage Examples¶

Training parameters

training_parameters = TrainingParameters(
    experiment_name='my_experiment',
    default_dtype=MLParameterOptions.DTYPE.FLOAT64,
    device=MLParameterOptions.DEVICE.AUTOMATIC,
)

Model evaluation

best_score, best_index, best_identifier, best_evaluator = model_collection.getBestModel(
    statistical_measure=R2Score,
    dataset_type=MLParameterOptions.DATASET_TYPE.TEST,
)

Model conversion

convertMACEModelToQATKFormat(
    "mace_model.model",
    conversion_format=MLParameterOptions.MODEL_CONVERSION.CUE_FP32,
)

Replay finetuning

replay_settings = MACEReplayFinetuningSettings(
    replay_data_filepath='/path/to/mp_traj_combined.xyz',
    number_of_samples=15000,
    replay_filtering_type=MLParameterOptions.REPLAY_FILTERING.COMBINATIONS,
    replay_subselect_method=MLParameterOptions.REPLAY_SUBSELECT.RANDOM,
)

String equivalents

String equivalents also work for all options:

best_score, best_index, best_identifier, best_evaluator = model_collection.getBestModel(
    statistical_measure=R2Score,
    dataset_type='test',  # or 'training'
)