MachineLearnedModelCollection¶
Included in QATK.MLFF
- class MachineLearnedModelCollection(model_evaluators=None)¶
Initialize the MachineLearnedModelCollection.
- Parameters:
model_evaluators (list of
MachineLearnedModelEvaluator| None) – Initial list of MachineLearnedModelEvaluator objects to add. All must have cached statistical data available. Default:[](empty list).
- addModel(model_evaluator)¶
Add a single MachineLearnedModelEvaluator to the collection.
- Parameters:
model_evaluator (
MachineLearnedModelEvaluator) – The model evaluator to add. Must have cached statistical data.- Raises:
NLValueError – If the evaluator is not a MachineLearnedModelEvaluator instance or if it doesn’t have cached statistical data.
- addModels(model_evaluators)¶
Add multiple MachineLearnedModelEvaluator objects to the collection.
- Parameters:
model_evaluators (list of
MachineLearnedModelEvaluator) – List of model evaluators to add.- Raises:
NLValueError – If inputs are invalid or lengths don’t match.
- getAllModelIdentifiers()¶
Get all model identifiers in the collection.
- Returns:
List of all model identifiers.
- Return type:
list of str
- getBestModel(statistical_measure, dataset_type='test', weights=None, use_cohesive_energy=None)¶
Get the best performing model using the ranking method.
- Parameters:
statistical_measure (RMSE | MAE | R2Score) – The statistical measure to use for ranking.
dataset_type (str) – Which dataset to use (MLParameterOptions.DATASET_TYPE.TRAINING or MLParameterOptions.DATASET_TYPE.TEST). Default:
MLParameterOptions.DATASET_TYPE.TEST.weights (list of float | None) – Weights for ranking. See rankModels for details.
use_cohesive_energy (bool | None) – Whether to use cohesive energy calculations.
- Returns:
Tuple containing (score, index, identifier, evaluator) for the best model.
- Return type:
tuple
- Raises:
NLValueError – If no valid models are found.
- getModelCount()¶
Get the number of models in the collection.
- Returns:
Number of models in the collection.
- Return type:
int
- getModelEvaluator(index)¶
Get a model evaluator by index.
- Parameters:
index (int) – Index of the model evaluator to retrieve.
- Returns:
The model evaluator at the specified index.
- Return type:
- Raises:
NLValueError – If index is out of range.
- getModelEvaluatorByIdentifier(model_identifier)¶
Get a model evaluator by its identifier.
- Parameters:
model_identifier (str) – The identifier of the model evaluator to retrieve.
- Returns:
The model evaluator with the specified identifier.
- Return type:
- Raises:
NLValueError – If the identifier is not found in the collection.
- getModelIdentifier(index)¶
Get a model identifier by index.
- Parameters:
index (int) – Index of the model identifier to retrieve.
- Returns:
The model identifier at the specified index.
- Return type:
str
- Raises:
NLValueError – If index is out of range.
- getModelIndexByIdentifier(model_identifier)¶
Get the index of a model by its identifier.
- Parameters:
model_identifier (str) – The identifier of the model to find.
- Returns:
The index of the model with the specified identifier.
- Return type:
int
- Raises:
NLValueError – If the identifier is not found in the collection.
- nlprint(stream=None, statistical_measure=None, dataset_type='test', weights=None, use_cohesive_energy=None)¶
Print a summary using the ranking method.
- Parameters:
stream (file-like | None) – The stream to write to. Default:
NLPrintLogger()statistical_measure (RMSE | MAE | R2Score | None) – The statistical measure to use. Default:
R2Score.dataset_type (str) – Which dataset to use. Possible values are
MLParameterOptions.DATASET_TYPE.TRAININGandMLParameterOptions.DATASET_TYPE.TEST. Default:MLParameterOptions.DATASET_TYPE.TEST.weights (list of float | None) – Weights for ranking. See rankModels for details.
use_cohesive_energy (bool | None) – Whether to use cohesive energy calculations.
- Returns:
The summary report.
- Return type:
str
- rankModels(statistical_measure, dataset_type='test', weights=None, use_cohesive_energy=None)¶
Rank models based on statistical measures and weights.
- Parameters:
statistical_measure (RMSE | MAE | R2Score) – The statistical measure to use for ranking.
dataset_type (str) – Which dataset to use (MLParameterOptions.DATASET_TYPE.TRAINING or MLParameterOptions.DATASET_TYPE.TEST). Default:
MLParameterOptions.DATASET_TYPE.TEST.weights (list of float | None) – Weights for [energy, forces, stress]. Weights are normalized automatically. Default:
[1, 1, 1](equal weights).use_cohesive_energy (bool | None) – Whether to use cohesive energy calculations. If None, determined automatically based on the availability of isolated_atom_energies in the Evaluator objects. Default:
None.
- Returns:
List of tuples (score, index, identifier) sorted by best performance first.
- Return type:
list of tuples
- Raises:
NLValueError – If statistical_measure is not supported, collection is empty, or other validation errors occur.
- summaryReport(stream, statistical_measure=None, dataset_type='test', weights=None, use_cohesive_energy=None)¶
Print a detailed summary report using the ranking method.
- Parameters:
stream (file-like) – The stream to write to.
statistical_measure (RMSE | MAE | R2Score | None) – The statistical measure to use. Default:
R2Score.dataset_type (str) – Which dataset to use. Possible values are
MLParameterOptions.DATASET_TYPE.TRAININGandMLParameterOptions.DATASET_TYPE.TEST. Default:MLParameterOptions.DATASET_TYPE.TEST.weights (list of float | None) – Weights for ranking. See rankModels for details.
use_cohesive_energy (bool | None) – Whether to use cohesive energy calculations.
- uniqueString()¶
Return a unique string representing the state of the object.
Notes¶
The MachineLearnedModelCollection object is a container that holds multiple MachineLearnedModelEvaluator objects from training multiple machine-learned force field models. It provides tools for comparing and ranking the trained models.
The collection object is retrieved from a MultipleMachineLearnedForceFieldTrainers instance using the
modelCollection()method after callingtrain().Each model in the collection is assigned a unique
model_identifierthat is automatically generated from the corresponding fitting parameters object in MultipleMachineLearnedForceFieldTrainers. For Moment Tensor Potential models, the identifier is derived from themtp_filenameparameter, while for MACE models it combines theexperiment_nameandrandom_seedfrom TrainingParameters as<experiment_name>_<random_seed>.qatkpt. These identifiers are used by theMLFFAnalyzerin Nanolab to link individual model statistics to their corresponding model files, enabling easy identification and comparison of different models in the visualization interface.The collection can be visually inspected in Nanolab using the
MLFFAnalyzertool, which provides comparative visualizations and statistical summaries across all trained models.Calling
nlprint()on the collection object will display a summary table showing performance metrics for all models, making it easy to compare the different training setups.The
getBestModel()method returns the best performing model based on a specified statistical measure (RMSE, MAE, or R2Score) and dataset type (MLParameterOptions.DATASET_TYPE.TRAININGorMLParameterOptions.DATASET_TYPE.TEST, or string equivalents'training'or'test'). It returns a tuple containing the best score, model index, identifier, and evaluator object.Individual model evaluators can be accessed from the collection using the
modelEvaluators()method, which returns a list of all MachineLearnedModelEvaluator objects. Thecalculator()method can then be called on any evaluator to retrieve its fitted calculator.