TrainingSet¶
- class TrainingSet(configurations, sample_size=None, calculator=None, recalculate_training_data=None, log_filename_prefix=None, data_tag=None)¶
Class for storing a set of training configurations.
- Parameters:
configurations (
MoleculeConfiguration
|BulkConfiguration
| sequence of[MoleculeConfiguration
|BulkConfiguration
] |Trajectory
|MDTrajectory
|ConfigurationDataContainer
|MomentTensorPotentialTraining
|SurfaceProcessSimulation
|NudgedElasticBand
|Table
) – The training set, either as a sequence of individual configurations or stored within a trajectory object. When using a trajectory object, energy, force and stress data may also be included.sample_size (int) – The number of configurations to use in the list provided, spaced out evenly. Default: All configurations.
calculator (Calculator) – The calculator that was used to calculate the energy, force and stress data on the trajectory object, if present. Default: energy, force and stress data is ignored.
recalculate_training_data (bool | None) – Flag to enforce or avoid recalculation of training data. If set, this flag will take precedence over the calculator. If not set, the data will be automatically recalculated if the specified calculator is different from the reference calculator in the MomentTensorPotentialTraining object. Default: None
log_filename_prefix (str) – Filename prefix for the logging output of the tasks associated with this set. Default: Defined by the
MomentTensorPotentialTraining
object.data_tag (str) – Label for this training set to enable selection of different data in MTP fitting.
- calculator()¶
- Returns:
The calculator that was used to calculate the energy, force and stress data on the trajectory object, or None if energy, force and stress data is not present or should be ignored.
- Return type:
Calculator | None
- configurations()¶
- Returns:
The configurations in the training set. If the configurations have the same elements they are returned as an
MDTrajectory
, otherwise the configurations are returned as aConfigurationDataContainer
.- Return type:
- dataTag()¶
- Returns:
The selection tag added to the data in the training set.
- Return type:
str
- logFilenameIdentifier()¶
- Returns:
Filename identifier for the logging output of the tasks associated with this set, or None if it hasn’t been set yet.
- Return type:
str | None
- logFilenamePrefix()¶
- Returns:
Filename prefix for the logging output of the tasks associated with this set, or None if it is to be defined by the
MomentTensorPotentialTraining
object.- Return type:
str |
LogToStdOut
| None
- nlinfo()¶
- Returns:
The training set information.
- Return type:
dict
- recalculateTrainingData()¶
- Returns:
A flag signaling whether the training data should be recalculated or not.
- Return type:
bool
- referenceConfigurations()¶
- Returns:
The list of reference configurations that identify the training set.
- Return type:
list of [
MoleculeConfiguration
|BulkConfiguration
]
- sampleSize()¶
- Returns:
The number of training configurations for each combination of list parameters.
- Return type:
int
- classmethod supportedConfigurationTypes()¶
Return a list of supported configurations.
- Returns:
List of supported configurations.
- Return type:
list
- uniqueString()¶
Return a unique string representing the state of the object.
Usage Examples¶
Setup of an MTP training by reading pre-calculated training data and passing it as TrainingSet:
# Import a pre-calculated training dataset.
training_data = nlread('training_data_precalculated.hdf5')[-1]
training_set_precalc = TrainingSet(training_data, recalculate_training_data=False)
# Import a training dataset with a calculator to check for calculation consistency.
# In this case the calculator is different to the one given in the MomentTensorPotentialTraining
# object, and so the training data will be re-calculated.
calculator = LCAOCalculator(exchange_correlation=GGA.PBE)
training_data = nlread('training_data_recalculate.hdf5')[-1]
training_set_recalc = TrainingSet(training_data, calculator=calculator)
# Import a training dataset with a calculator to check for calculation consistency.
# In this case the calculator is the same as the one given in the MomentTensorPotentialTraining
# object, and so the training data will not be re-calculated.
calculator = LCAOCalculator(exchange_correlation=HybridGGA.HSE06)
training_data = nlread('training_data_same_calculator.hdf5')[-1]
training_set_same_calc = TrainingSet(training_data, calculator=calculator)
# Set up MTP training and run the training data calculation and MTP training.
moment_tensor_potential_training = MomentTensorPotentialTraining(
filename='mtp_study.hdf5',
object_id='training',
training_sets=[training_set_precalc, training_set_recalc, training_set_same_calc],
calculator=LCAOCalculator(exchange_correlation=HybridGGA.HSE06),
calculate_stress=True,
fitting_parameters_list=fitting_parameters,
)
moment_tensor_potential_training.update()
Here, the first training dataset is included as is, without recalculating any of the energy, force
or stress values. For the second training dataset, a calculator is given. As this calculator is
different to the one given in the MomentTensorPotentialTraining
, the training data will
be recalculated. In the third training set the same calculator is given as in the
MomentTensorPotentialTraining
object. Here, if training data is given for each
configuration, the training data will not be re-calculated.
The following example shows how to save the training data from a
completed MomentTensorPotentialTraining
object in
a TrainingSet
and read it again to be used in another training.
# Run the training calculations using the MomentTensorPotentialTraining
moment_tensor_potential_training = MomentTensorPotentialTraining(
filename='mtp_study',
object_id='training',
training_sets=training_sets,
calculator=calculator,
calculate_stress=True,
)
moment_tensor_potential_training.update()
# Pack the completed MomentTensorPotentialTraining object in a TrainingSet and save it.
training_set = TrainingSet(
moment_tensor_potential_training,
calculator=calculator,
recalculate_training_data=False,
)
nlsave('mtp_training_set1.hdf5', training_set)
# Read it again and combine with other training sets from disc to run a new fit.
training_sets = [
nlread('mtp_training_set1.hdf5', TrainingSet)[0],
nlread('mtp_training_set2.hdf5', TrainingSet)[0],
nlread('mtp_training_set3.hdf5', TrainingSet)[0],
]
new_mtp_training = MomentTensorPotentialTraining(
filename='mtp_study_from_training_sets.hdf5',
object_id='training',
training_sets=training_sets,
calculator=calculator,
calculate_stress=True,
fitting_parameters=MomentTensorPotentialFittingParameters(),
train_test_split=0.95,
)
new_mtp_training.update()
nlprint(new_mtp_training)
Notes¶
The TrainingSet
class can be used to include existing configurations,
MD, optimization, ConfigurationDataContainer
, or SurfaceProcessSimulation
trajectories, as well as pre-calculated MomentTensorPotentialTraining
objects in the
training data of another MomentTensorPotentialTraining
.
This can be useful when combining existing training data from different sources or projects to fit a new Moment Tensor Potential. It can also be used to efficiently re-calculate DFT data for existing un-labelled configurations, i.e. configurations without DFT energy, forces, and stress.
By default training data is re-calculated using the calculator in the
MomentTensorPotentialTraining
object. To keep the original data in the training set, the
argument recalculate_training_data
can be set to False
. This stops re-calculation of data
regardless of calculator settings. In this case if any training data is missing, an error will be
raised in the MomentTensorPotentialTraining
object. It is also possible to re-calculate
data based on the consistency of calculator settings. The argument calculator
takes the
calculator used to generate the energy, force and stress data. If this calculator is the same as
the calculator given in the MomentTensorPotentialTraining
object, the original data is
kept and not re-calculated. If the calculators differ or there is missing training data then the
dataset is re-calculated using the MomentTensorPotentialTraining
calculator.
It is generally recommended to save training data as TrainingSet
objects, which can
easily be read and combined with other TrainingSet objects in a list or Table
to be
re-used in other training scenarios (as shown in the example above).