MLParameterOptions¶
Included in QATK.MLFF
- class MLParameterOptions¶
Framework-agnostic container for ML parameters with choosable options.
- class DATASET_TYPE¶
Dataset type for model evaluation and training.
- class DEVICE¶
Device options for ML training and inference.
- class DTYPE¶
Data type precision options.
- class MODEL_CONVERSION¶
Format for converting MACE models to QATK format.
- class POOLING¶
Pooling methods for aggregation.
- class REPLAY_FILTERING¶
Filtering type for replay configurations based on elements.
- class REPLAY_SUBSELECT¶
Subselection method for replay data sampling.
- class TASK_TYPE¶
Task type for property prediction.
Notes¶
The MLParameterOptions class provides predefined options for various MLFF training parameters. These options are organized into nested classes, each representing a specific parameter category.
Available Option Groups For MLFF Model Training¶
DEVICE
Controls which hardware device is used for training:
MLParameterOptions.DEVICE.AUTOMATIC- Automatically selects GPU if available, otherwise CPUMLParameterOptions.DEVICE.CPU- Use CPU for trainingMLParameterOptions.DEVICE.GPU- Use GPU (CUDA) for training
DTYPE
Specifies the floating-point precision for model parameters:
MLParameterOptions.DTYPE.FLOAT32- 32-bit floating point (faster, less memory)MLParameterOptions.DTYPE.FLOAT64- 64-bit floating point (more accurate)
TASK_TYPE
Defines the type of prediction task:
MLParameterOptions.TASK_TYPE.GENERAL- Global, per-configuration property predictionMLParameterOptions.TASK_TYPE.ATOM_WISE- Atom-wise property prediction
POOLING
Specifies the pooling method for aggregating atomic contributions:
MLParameterOptions.POOLING.SUM- Sum atomic contributionsMLParameterOptions.POOLING.MEAN- Average atomic contributionsMLParameterOptions.POOLING.MAX- Take maximum of atomic contributions
DATASET_TYPE
Defines the type of dataset used for model evaluation and training. This allows distinguishing between training data used to fit the model and test data used to evaluate model performance.
MLParameterOptions.DATASET_TYPE.TRAINING- Indicates the dataset is used for training the modelMLParameterOptions.DATASET_TYPE.TEST- Indicates the dataset is used for testing/evaluation purposes
MODEL_CONVERSION
Defines the format for converting MACE models to QuantumATK-compatible format. Different formats provide different precision and performance characteristics for model deployment.
MLParameterOptions.MODEL_CONVERSION.E3NN- Convert using the E3NN (Euclidean neural networks) format. This is the standard formatMLParameterOptions.MODEL_CONVERSION.CUE_FP32- Convert to CuEquivariance format with 32-bit floating-point precision (optimized CUDA kernels)MLParameterOptions.MODEL_CONVERSION.CUE_FP64- Convert to CuEquivariance format with 64-bit floating-point precision (higher numerical accuracy)
REPLAY_FILTERING
Defines filtering options for replay configurations based on chemical elements when performing replay finetuning of MACE models. Controls which configurations from the foundation model’s original training data are included during finetuning.
MLParameterOptions.REPLAY_FILTERING.NONE- No filtering appliedMLParameterOptions.REPLAY_FILTERING.COMBINATIONS- Only include configurations with combinations of elements present in new training data (default)MLParameterOptions.REPLAY_FILTERING.EXCLUSIVE- Only include configurations with exclusively the elements in new training data (more restrictive)MLParameterOptions.REPLAY_FILTERING.INCLUSIVE- Include configurations with at least one element from new training data (more permissive)
REPLAY_SUBSELECT
Defines methods for subselecting configurations from the replay dataset when performing replay finetuning of MACE models. Controls how configurations are sampled from the foundation model’s original training data.
MLParameterOptions.REPLAY_SUBSELECT.RANDOM- Randomly select configurations (default, fastest)MLParameterOptions.REPLAY_SUBSELECT.FPS- Use Farthest Point Sampling for maximal diversity (slower, requires foundation model)
Usage Examples¶
Training parameters
training_parameters = TrainingParameters(
experiment_name='my_experiment',
default_dtype=MLParameterOptions.DTYPE.FLOAT64,
device=MLParameterOptions.DEVICE.AUTOMATIC,
)
Model evaluation
best_score, best_index, best_identifier, best_evaluator = model_collection.getBestModel(
statistical_measure=R2Score,
dataset_type=MLParameterOptions.DATASET_TYPE.TEST,
)
Model conversion
convertMACEModelToQATKFormat(
"mace_model.model",
conversion_format=MLParameterOptions.MODEL_CONVERSION.CUE_FP32,
)
Replay finetuning
replay_settings = MACEReplayFinetuningSettings(
replay_data_filepath='/path/to/mp_traj_combined.xyz',
number_of_samples=15000,
replay_filtering_type=MLParameterOptions.REPLAY_FILTERING.COMBINATIONS,
replay_subselect_method=MLParameterOptions.REPLAY_SUBSELECT.RANDOM,
)
String equivalents
String equivalents also work for all options:
best_score, best_index, best_identifier, best_evaluator = model_collection.getBestModel(
statistical_measure=R2Score,
dataset_type='test', # or 'training'
)