MachineLearnedModelCollection

Included in QATK.MLFF

class MachineLearnedModelCollection(model_evaluators=None)

Initialize the MachineLearnedModelCollection.

Parameters:

model_evaluators (list of MachineLearnedModelEvaluator | None) – Initial list of MachineLearnedModelEvaluator objects to add. All must have cached statistical data available.
Default: [] (empty list).

addModel(model_evaluator)

Add a single MachineLearnedModelEvaluator to the collection.

Parameters:

model_evaluator (MachineLearnedModelEvaluator) – The model evaluator to add. Must have cached statistical data.

Raises:

NLValueError – If the evaluator is not a MachineLearnedModelEvaluator instance or if it doesn’t have cached statistical data.

addModels(model_evaluators)

Add multiple MachineLearnedModelEvaluator objects to the collection.

Parameters:

model_evaluators (list of MachineLearnedModelEvaluator) – List of model evaluators to add.

Raises:

NLValueError – If inputs are invalid or lengths don’t match.

getAllModelIdentifiers()

Get all model identifiers in the collection.

Returns:

List of all model identifiers.

Return type:

list of str

getBestModel(statistical_measure, dataset_type='test', weights=None, use_cohesive_energy=None)

Get the best performing model using the ranking method.

Parameters:
  • statistical_measure (RMSE | MAE | R2Score) – The statistical measure to use for ranking.

  • dataset_type (str) – Which dataset to use (MLParameterOptions.DATASET_TYPE.TRAINING or MLParameterOptions.DATASET_TYPE.TEST).
    Default: MLParameterOptions.DATASET_TYPE.TEST.

  • weights (list of float | None) – Weights for ranking. See rankModels for details.

  • use_cohesive_energy (bool | None) – Whether to use cohesive energy calculations.

Returns:

Tuple containing (score, index, identifier, evaluator) for the best model.

Return type:

tuple

Raises:

NLValueError – If no valid models are found.

getModelCount()

Get the number of models in the collection.

Returns:

Number of models in the collection.

Return type:

int

getModelEvaluator(index)

Get a model evaluator by index.

Parameters:

index (int) – Index of the model evaluator to retrieve.

Returns:

The model evaluator at the specified index.

Return type:

MachineLearnedModelEvaluator

Raises:

NLValueError – If index is out of range.

getModelEvaluatorByIdentifier(model_identifier)

Get a model evaluator by its identifier.

Parameters:

model_identifier (str) – The identifier of the model evaluator to retrieve.

Returns:

The model evaluator with the specified identifier.

Return type:

MachineLearnedModelEvaluator

Raises:

NLValueError – If the identifier is not found in the collection.

getModelIdentifier(index)

Get a model identifier by index.

Parameters:

index (int) – Index of the model identifier to retrieve.

Returns:

The model identifier at the specified index.

Return type:

str

Raises:

NLValueError – If index is out of range.

getModelIndexByIdentifier(model_identifier)

Get the index of a model by its identifier.

Parameters:

model_identifier (str) – The identifier of the model to find.

Returns:

The index of the model with the specified identifier.

Return type:

int

Raises:

NLValueError – If the identifier is not found in the collection.

nlprint(stream=None, statistical_measure=None, dataset_type='test', weights=None, use_cohesive_energy=None)

Print a summary using the ranking method.

Parameters:
  • stream (file-like | None) – The stream to write to.
    Default: NLPrintLogger()

  • statistical_measure (RMSE | MAE | R2Score | None) – The statistical measure to use.
    Default: R2Score.

  • dataset_type (str) – Which dataset to use. Possible values are MLParameterOptions.DATASET_TYPE.TRAINING and MLParameterOptions.DATASET_TYPE.TEST.
    Default: MLParameterOptions.DATASET_TYPE.TEST.

  • weights (list of float | None) – Weights for ranking. See rankModels for details.

  • use_cohesive_energy (bool | None) – Whether to use cohesive energy calculations.

Returns:

The summary report.

Return type:

str

rankModels(statistical_measure, dataset_type='test', weights=None, use_cohesive_energy=None)

Rank models based on statistical measures and weights.

Parameters:
  • statistical_measure (RMSE | MAE | R2Score) – The statistical measure to use for ranking.

  • dataset_type (str) – Which dataset to use (MLParameterOptions.DATASET_TYPE.TRAINING or MLParameterOptions.DATASET_TYPE.TEST).
    Default: MLParameterOptions.DATASET_TYPE.TEST.

  • weights (list of float | None) – Weights for [energy, forces, stress]. Weights are normalized automatically.
    Default: [1, 1, 1] (equal weights).

  • use_cohesive_energy (bool | None) – Whether to use cohesive energy calculations. If None, determined automatically based on the availability of isolated_atom_energies in the Evaluator objects.
    Default: None.

Returns:

List of tuples (score, index, identifier) sorted by best performance first.

Return type:

list of tuples

Raises:

NLValueError – If statistical_measure is not supported, collection is empty, or other validation errors occur.

summaryReport(stream, statistical_measure=None, dataset_type='test', weights=None, use_cohesive_energy=None)

Print a detailed summary report using the ranking method.

Parameters:
  • stream (file-like) – The stream to write to.

  • statistical_measure (RMSE | MAE | R2Score | None) – The statistical measure to use.
    Default: R2Score.

  • dataset_type (str) – Which dataset to use. Possible values are MLParameterOptions.DATASET_TYPE.TRAINING and MLParameterOptions.DATASET_TYPE.TEST.
    Default: MLParameterOptions.DATASET_TYPE.TEST.

  • weights (list of float | None) – Weights for ranking. See rankModels for details.

  • use_cohesive_energy (bool | None) – Whether to use cohesive energy calculations.

uniqueString()

Return a unique string representing the state of the object.

Notes

  • The MachineLearnedModelCollection object is a container that holds multiple MachineLearnedModelEvaluator objects from training multiple machine-learned force field models. It provides tools for comparing and ranking the trained models.

  • The collection object is retrieved from a MultipleMachineLearnedForceFieldTrainers instance using the modelCollection() method after calling train().

  • Each model in the collection is assigned a unique model_identifier that is automatically generated from the corresponding fitting parameters object in MultipleMachineLearnedForceFieldTrainers. For Moment Tensor Potential models, the identifier is derived from the mtp_filename parameter, while for MACE models it combines the experiment_name and random_seed from TrainingParameters as <experiment_name>_<random_seed>.qatkpt. These identifiers are used by the MLFFAnalyzer in Nanolab to link individual model statistics to their corresponding model files, enabling easy identification and comparison of different models in the visualization interface.

  • The collection can be visually inspected in Nanolab using the MLFFAnalyzer tool, which provides comparative visualizations and statistical summaries across all trained models.

  • Calling nlprint() on the collection object will display a summary table showing performance metrics for all models, making it easy to compare the different training setups.

  • The getBestModel() method returns the best performing model based on a specified statistical measure (RMSE, MAE, or R2Score) and dataset type (MLParameterOptions.DATASET_TYPE.TRAINING or MLParameterOptions.DATASET_TYPE.TEST, or string equivalents 'training' or 'test'). It returns a tuple containing the best score, model index, identifier, and evaluator object.

  • Individual model evaluators can be accessed from the collection using the modelEvaluators() method, which returns a list of all MachineLearnedModelEvaluator objects. The calculator() method can then be called on any evaluator to retrieve its fitted calculator.