ActiveLearningSimulation¶
- class ActiveLearningSimulation(fitting_parameters, initial_training_data, mtp_study_filename, mtp_study_object_id, reference_calculator, correction_calculator=None, d3_dispersion_calculator=None, candidate_threshold=None, retrain_threshold=None, check_interval=None, limit_candidates=None, max_forces_check=None, use_stress=None, candidate_trajectory_filename=None, candidate_trajectory_object_id=None, max_iterations=None, processes_per_calculation=None, extrapolation_selection_parameters=None, use_linearized_coefficient_matrix=None, minimum_bond_length_percent=None, log_filename_prefix=None, restart_simulation=None)¶
Set up an object that can be used to run active learning simulations using the Moment Tensor Potential.
- Parameters:
fitting_parameters (
MomentTensorPotentialFittingParameters
) – The parameters for MTP fitting.initial_training_data (
TrainingSet
|Trajectory
|ConfigurationDataContainer
|MDTrajectory
|MomentTensorPotentialTraining
| sequence of [TrainingSet
|Trajectory
|ConfigurationDataContainer
|MDTrajectory
|MomentTensorPotentialTraining
] | Table) – The initial training data. All configurations in the initial training data must have precalculated reference data (energy, forces, and possibly stress), otherwise the configuration is not included in the active learning training.mtp_study_filename (str) – The file name of file that contains the MTP study object used for training the moment tensor potential.
mtp_study_object_id (str) – The object_id of the MTP study object used to train the moment tensor potential.
reference_calculator (Calculator) – The reference calculator.
correction_calculator (Calculator) – Calculator used to correct reference values. The MTP is fit to the reference calculator minus the contribution of the correction calculator.
d3_dispersion_calculator (
TremoloXCalculator
) – A D3 dispersion calculator that, when added, is used in the MD simulation during the active learning cycle as a correction to the MTP output. Default: Nonecandidate_threshold (float) – The threshold above which configurations will be added as candidates for the next training iteration. Default: 1
retrain_threshold (float) – The extrapolation threshold at which the simulation is stopped and trained again. Default: 3
check_interval (int) – The interval at which the extrapolation grade is calculated and checked in MD simulations. In active learning optimization the extrapolation grade is always checked at every step. Default: 10
limit_candidates (int) – Upper limit for the number of candidates collected each iteration that trigger retraining. This can be used to limit the number of reference calculator calls when many configurations fall between the candidate and retrain thresholds. Default: No limit
max_forces_check (PhysicalQuantity of type energy / length) – If the max. force on an atom exceeds this value the extrapolation grade is always checked. Only available with MD active learning; has no effect in other cases. Default: 10*eV / Angstrom
use_stress (bool) – Whether or not stress is used in training the MTP potential. Default: True
candidate_trajectory_filename (str) – The filename for the trajectory to store the candidate configurations that have been added to the training set.
candidate_trajectory_object_id (str) – The object ID for the trajectory to store the candidate configurations that have been added to the training set.
max_iterations (int) – The max number of retraining iterations. Default: 20
processes_per_calculation (int |
None
) – The number of processes used for each calculation with the reference calculator. Default: All processes corresponding toNone
.extrapolation_selection_parameters (
ExtrapolationSelectionParameters
) – The parameters that specify the details of the extrapolation grade calculation and candidate selection.use_linearized_coefficient_matrix (bool) – Deprecated: Use
extrapolation_grad_algorithm=MaxvolLinearized
inExtrapolationSelectionParameters
instead.minimum_bond_length_percent (float) – The minimum percentage of bond length as fraction of the covalent radii that is allowed in a training configuration. If any bond is less than that value then the configuration will be discarded from the training data. Default: 0.35
log_filename_prefix (str | LogToStdOut | None) – The prefix used in the log file names generated by the training data calculation and fitting parts of the simulation.
restart_simulation (bool) – Restart the simulation with any configurations added during previous active learning. simulations Default: False
- additionalTrainingData()¶
- Returns:
The training dataset that has been added during the active learning simulation.
- Return type:
- additionalTrainingSet()¶
- Returns:
The training dataset that has been added during the active learning simulation or None if no training data has been added.
- Return type:
TrainingSet
| None
- candidateThreshold()¶
- Returns:
The candidate threshold.
- Return type:
float
- candidateTrajectoryFilename()¶
- Returns:
The filename to which the candidate trajectory is written.
- Return type:
str
- candidateTrajectoryObjectId()¶
- Returns:
The object ID to which the candidate trajectory is written.
- Return type:
str
- checkInterval()¶
- Returns:
The check interval for the extrapolation grade.
- Return type:
int
- committeeSize()¶
- Returns:
The number of committee members when the query-by-committee method is used for the extrapolation grade.
- Return type:
int
- correctionCalculator()¶
- Returns:
The correction calculator, if defined.
- Return type:
Calculator
- extrapolationGradeAlgorithm()¶
- Returns:
Which extrapolation grade algorithm should be used.
- Return type:
MaxvolStandard | MaxvolLinearized | MaxForce | QueryByCommitteeForces | QueryByCommitteeEnergy
- fittingParameters()¶
- Returns:
The MTP fitting parameters.
- Return type:
- forcesCap()¶
- Returns:
The forces cap.
- Return type:
PhysicalQuantity of type energy / length.
- initialTrainingData()¶
- Returns:
The initial training dataset.
- Return type:
- initialTrainingSet()¶
- Returns:
The initial training dataset for this active learning simulation.
- Return type:
- limitCandidates()¶
- Returns:
Upper limit for the number of candidates that are collected each iteration before retraining.
- Return type:
int
- logFilenamePrefix()¶
- Returns:
The prefix used in MTP training log files. The flag LogToStdOut causes log output to be written to standard out.
- Return type:
str
- maxForcesCheck()¶
- Returns:
The max. forces value at which a check of the extrapolation grade is enforced.
- Return type:
PhysicalQuantity of type energy / length.
- maxIterations()¶
- Returns:
The maximum number of retraining iterations.
- Return type:
int
- minBondLengthPercent()¶
- Returns:
The minimum bond length as fraction of the covalent radii.
- Return type:
float
- mtpParameters()¶
- Returns:
Gets the MTPParameters containing the Optimized MTP parameters
- Return type:
MTPParameters | None
- processesPerCalculation()¶
- Returns:
The number of processes that is used for a configuration when calculating the reference data.
- Return type:
int
- referenceCalculator()¶
- Returns:
The reference calculator
- Return type:
Calculator
- restartSimulation()¶
- Returns:
Whether or not the simulation and training are restarted from a previous run.
- Return type:
bool
- retrainThreshold()¶
- Returns:
The retrain threshold.
- Return type:
float
- runMolecularDynamics(configuration, constraints=None, trajectory_filename=None, steps=None, log_interval=None, method=None, xyz_filename=None, hook_functions=None, pre_step_hook=None, post_step_hook=None, write_velocities=None, write_forces=True, write_stresses=None, domain_decomposition_pattern=None, trajectory_interval=None, measurement_hook=None, trajectory_object_id=None, number_of_independent_runners=None, log_filename_prefix=None)¶
Run an active learning MD simulation.
- Parameters:
configuration (
BulkConfiguration
| sequence of typeBulkConfiguration
) – The initial configuration. When using multiple independent runners this can be given as a list with a different item for each runner.constraints (list of ints | list of
BaseConstraint
) – The list of atomic indices, denoting fixed atoms, or constraint objects, such asRigidBody
. Default: [].trajectory_filename (str | sequence of str | None) – The filename of the file to be used for storing the trajectory, or None if no trajectory should be written. A trajectory filename should not be given if
configuration
is a trajectory. When using multiple independent runners this can be given as a list with a different item for each runner. Default: None.steps (int) – The number of time-steps to take in the simulation. Default: 50.
log_interval (int) – The interval at which information, such as time, energy, temperature, etc. is written to the log output. Default: 1.
method (
BaseMDmethod
) – The MD method used for the simulation. When using multiple independent runners this can be given as a list with a different item for each runner. Default:NVEVelocityVerlet
.xyz_filename (str) – The name of the file to be used for storing the xyz trajectory, or None if no xyz-trajectory should be written. Default: None.
hook_functions (None) – Currently not supported in active learning.
pre_step_hook (function | list of functions | None) – An optional user-defined function or a list of functions which will be called just before the forces evaluation. The signature of the function requires the arguments ( step, time, configuration). The return status is ignored. Unhandled exceptions will terminate the evaluation. If a list is given the functions will be called in the given order. Default: None.
post_step_hook (function | list of functions | None) – An optional user-defined function or a list of functions which will be called just after the forces evaluation. The signature of the function requires the arguments (step, time, configuration). Optional arguments include, forces, local_forces (for distributed MD), stress, trajectory (the MD trajectory), temperature, pressure, potential_energy, and/or kinetic_energy. The return status is ignored. Unhandled exceptions will terminate the evaluation. If a list is given the functions will be called in the given order. Default: None.
write_velocities (bool) – Write the velocities to the trajectory file every
log_interval
steps. Ifconfiguration
is a trajectory (i.e. this is a restart calculation) this parameter will be the same value as it was in the previous trajectory. Default: True.write_forces (bool) – Write the forces to the trajectory file every
log_interval
steps. Ifconfiguration
is a trajectory (i.e. this is a restart calculation) this parameter will be the same value as it was in the previous trajectory. Default: True.write_stresses (bool) – Write the stress to the trajectory file every
log_interval
steps. A value of None means thatwrite_stresses
is True by default for NPT methods and False otherwise. This is to avoid the additional work of calculating the stress when it is not needed. Ifconfiguration
is a trajectory (i.e. this is a restart calculation) this parameter will be the same value as it was in the previous trajectory. Default: None.domain_decomposition_pattern (list of type int |
Automatic
| None) – The pattern how the domains should be arranged in a parallel simulations. E.g. [1, 2, 4] means 1 domain in A-, 2 in B-, and 4 in C-direction. IfAutomatic
domain decomposition is used, then the simulation cell will be divided into domains whose edges are as close together in length as possible. IfNone
is given then domain decomposition will be disabled. Default:Automatic
.trajectory_interval (int) – The resolution used in saving steps to a trajectory file. A value of 1 results in all steps being saved; a value of 2 results in every second step being saved; etc. If
configuration
is a trajectory (i.e. this is a restart simulation) this parameter will use the same value as was used in the previous trajectory. Default: The same value aslog_interval
.measurement_hook (function | list of functions | None) – An optional user-defined function or a list of functions which will be called at the end of each step after all constraints have been applied. The signature of the function requires the arguments (step, time, configuration). Optional arguments include, forces, local_forces (for distributed MD), stress, trajectory (the MD trajectory), temperature, pressure, potential_energy, and/or kinetic_energy. The return value should be a dictionary that maps string keys to values. The values may be numbers, numpy arrays, or PhysicalQuantities. These values are stored on the MDTrajectory and may be accessed using the measurement method. Unhandled exceptions will terminate the evaluation. If a list is given the functions will be called in the given order. Default: None.
trajectory_object_id (str | None) – The object id of the trajectory written to
trajectory_filename
. If a value ofNone
is given, then an object id will be chosen automatically. Default: None.number_of_independent_runners (int) – The number of independent simulations that should be run. If greater than 1, each runner will run in an independent process group and at the end of each iteration, the candidates of all runners will be gathered, selected, and added to the training data. The simulation completes if all runners complete without exceeding the retraining threshold. Default: Length of
configuration
.log_filename_prefix (str | None) – The prefix used in log files containing details of the molecular dynamics simulation.
- Returns:
The final trajectory or, in case of multiple runners, a list containing the trajectories of all runners. In case the active learning simulation does not complete within the max. number of iterations, None is returned.
- Return type:
MDTrajectory
| list | None
- runOptimizeGeometry(configuration, max_forces=None, max_stress=None, max_steps=None, max_step_length=None, constraints=None, trajectory_filename=None, trajectory_object_id=None, optimize_cell=None, disable_stress=None, optimizer_method=None, target_stress=None, constrain_bravais_lattice=None, trajectory_interval=None, remove_drift=None, enable_optimization_stop_file=None, restart_strategy=<class 'NL.ComputerScienceUtilities.NLFlag._NLFlag.NoRestart'>, number_of_independent_runners=None, log_filename_prefix=None, continue_on_non_convergence=None)¶
Run an active learning geometry optimization.
- Parameters:
configuration (
BulkConfiguration
| sequence of typeBulkConfiguration
) – The configuration to be optimized. When using multiple independent runners this can be given as a list with a different item for each runner.max_forces (PhysicalQuantity of type force) – The convergence criterion for the atomic forces. Default: 0.05*eV/Angstrom.
max_stress (PhysicalQuantity of type pressure) – The convergence criterion for the maximum difference between the internal stress and the target stress. Default: 0.1*GPa.
max_steps (int) – The maximum number of optimization steps. Default: 200.
max_step_length (PhysicalQuantity of type length) – The maximum step length the optimizer may take. Default: 0.2*Ang.
constraints (list of integers and Constraints objects) – A list of indices of the atom with fixed positions and Constraints objects. Default: [].
trajectory_filename (str | sequence of str | None) – The filename used to store the trajectory. If the value is None then no trajectory file will be written. When using multiple independent runners this can be given as a list with a different item for each runner. Default: None.
trajectory_object_id (str | None) – The object id of the trajectory written to
trajectory_filename
. If a value ofNone
is given, then an object id will be chosen automatically. Default: None.optimize_cell (bool) – The lattice vectors for bulk configuration will change during the optimization. Enabling the stress calculation for bulk configurations. Default: True for BulkConfigurations otherwise False.
disable_stress (bool) – Deprecated: from v2022.03, use
optimize_cell
parameter instead.optimizer_method (
FIRE
|LBFGS
) – The optimization algorithm to use. Default:LBFGS
.target_stress (PhysicalQuantity of type pressure|PHYSICALQUANTITY| of type pressure) – The target internal stress (tensor) of the system. Can be given as a single value in case of isotropic pressure, or as an internal stress vector in Voigt notation or as a 3x3-matrix. Default: 0*GPa.
constrain_bravais_lattice (bool) – Enable preserving the Bravais lattice symmetry of the configuration. Default: True if the
target_stress
is commensurate with the lattice symmetries.trajectory_interval (int | PhysicalQuantity of type time) – The resolution used in saving steps to a trajectory file. This can either be given as an integer (a value of 1 results in all steps being saved; a value of 2 results in every second step being saved; etc.) or as a time interval. Default: 1.
remove_drift (bool) – In ab-initio calculations, the sum of the forces along each Cartesian direction does not necessarily sum to zero due to numerical inaccuracies. This option controls if the “drift” in the forces should be removed by subtracting the average force along each Cartesian direction from all atoms. Default: True.
enable_optimization_stop_file (bool) – Determines whether to enable a file for stopping the geometry optimization. If
True
, creation of the stop file will stop the optimization at the next step. The name of the stop file will be shown in the log output; it will bestop-geometry-optimization-uniqueID
, whereuniqueID
is a randomly generated identifier for this optimization. The file must be created in the current working directory. Default: True.restart_strategy (
NoRestart
) – The restart mechanism has to be set toNoRestart
for ActiveLearningSimulation Default:NoRestart
.number_of_independent_runners (int) – The number of independent geometry optimizations that should be run. If greater than 1, each runner will run in an independent process group and at the end of each iteration, the candidates of all runners will be gathered, selected, and added to the training data. The simulation completes if all runners complete without exceeding the retraining threshold. Default: Length of
configuration
.log_filename_prefix (str | None) – The prefix used in log files containing details of the geometry optimization.
continue_on_non_convergence (bool) – Continue with the converged configurations when using multiple runners and only some optimizations converge in the given number of active learning cycles. If
False
is given the calculation will raise an exception if any optimization does not converge. Default: False
- runOptimizeNudgedElasticBand(neb, max_forces=None, max_stress=None, max_steps=None, max_step_length=None, constraints=None, trajectory_filename=None, spring_constant=None, climbing_image=None, preoptimization=None, optimizer_method=None, log_filename_prefix=<class 'NL.ComputerScienceUtilities.NLFlag._NLFlag.Automatic'>, optimize_cell=None, target_stress=None, trajectory_interval=None, restart_strategy=<class 'NL.ComputerScienceUtilities.NLFlag._NLFlag.NoRestart'>)¶
Run an active learning nudged elastic band optimization.
- Parameters:
neb (
NudgedElasticBand
) – The nudged elastic band configuration to optimize.max_forces (PhysicalQuantity of type force) – The convergence criterion for the atomic forces. Default: 0.05*eV/Angstrom.
max_stress (PhysicalQuantity of type pressure) – The convergence criterion for the maximum difference between the internal stress and the target stress. Default: 0.1*GPa.
max_steps (int) – The maximum number of optimization steps. Default: 200.
max_step_length (PhysicalQuantity of type length) – The maximum step length the optimizer may take. Default: 0.2*Ang.
constraints (list of ints) – List of atom indices that are kept fixed during optimization. Default: [].
trajectory_filename (str | sequence of str | None) – The filename used to store the trajectory. If the value is None then no trajectory file will be written. Default: None.
spring_constant (PhysicalQuantity of type eV/Ang**2) – The spring constant used for the NEB relaxation. Default: 5.0*(eV/Ang**2).
climbing_image (bool) – Flag indicating if the climbing image algorithm should be used to find a transition state. Default: False.
preoptimization (bool) – Flag indicating if the endpoints should be optimized before the NEB optimization. Default: False.
optimizer_method (
FIRE
|LBFGS
) – The optimizer to use for optimizing the structure. Default:LBFGS
.log_filename_prefix (
Automatic
| str | None) – The logging output from each image will be written to filenames starting with this value. If it is set to Automatic then the prefix will be the name of the calling python script. If it is set to None, then all output will be written to stdout. Default:Automatic
.optimize_cell (bool) – If it is None, it will be set automatically according to the NEB configuration. If the NEB configuration is composed of configurations of the type
BulkConfiguration
and the lattice vectors of the configurations are different, it is set to True; otherwise it is False. Default: None.target_stress (PhysicalQuantity of type pressure) – The target internal stress (tensor) of the system. Can be given as a single value in case of isotropic pressure, or as an internal stress vector in Voigt notation or as a 3x3-matrix. Default: 0*GPa.
trajectory_interval (int | PhysicalQuantity of type time) – The resolution used in saving steps to a trajectory file. This can either be given as an integer (a value of 1 results in all steps being saved; a value of 2 results in every second step being saved; etc.) or as a time interval. Default: 1.
restart_strategy (
NoRestart
) – The restart mechanism has to be set toNoRestart
for ActiveLearningSimulation Default:NoRestart
.
- Returns:
The optimized NEB. In case the active learning MD simulation does not complete within the max. number of iterations, None is returned.
- Return type:
NudgedElasticBand
| None
- runTimeStampedForceBiasMonteCarlo(configuration, constraints=None, trajectory_filename=None, steps=None, log_interval=None, method=None, hook_functions=None, pre_step_hook=None, post_step_hook=None, write_velocities=None, write_forces=None, write_stresses=None, trajectory_interval=None, trajectory_object_id=None, number_of_independent_runners=None, log_filename_prefix=None)¶
Run an active learning TFMC simulation.
- Parameters:
configuration (
BulkConfiguration
| sequence of typeBulkConfiguration
) – The initial configuration. When using multiple independent runners this can be given as a list with a different item for each runner.constraints (list of ints | list of
BaseConstraint
) – The list of atomic indices, denoting fixed atoms, or constraint objects, such asRigidBody
. Default: [].trajectory_filename (str | sequence of str | None) – The filename of the file to be used for storing the trajectory, or None if no trajectory should be written. A trajectory filename should not be given if
configuration
is a trajectory. When using multiple independent runners this can be given as a list with a different item for each runner. Default: None.steps (int) – The number of time-steps to take in the simulation. Default: 50.
log_interval (int) – The interval at which information, such as time, energy, temperature, etc. is written to the log output. Default: 1.
method (
ForceBiasMontCarlo
|ForceBiasMonteCarloNPTBerendsen
) – The Monte Carlo method used for the simulation. When using multiple independent runners this can be given as a list with a different item for each runner. Default:ForceBiasMonteCarlo
hook_functions (None) – Currently not supported in active learning.
pre_step_hook (function | list of functions | None) – An optional user-defined function or a list of functions which will be called just before the forces evaluation. The signature of the function requires the arguments ( step, time, configuration). The return status is ignored. Unhandled exceptions will terminate the evaluation. If a list is given the functions will be called in the given order. Default: None.
post_step_hook (function | list of functions | None) – An optional user-defined function or a list of functions which will be called just after the forces evaluation. The signature of the function requires the arguments (step, time, configuration, forces, stress). The return status is ignored. Unhandled exceptions will terminate the simulation. If a list is given the functions will be called in the given order. Default:
None
write_velocities (bool) – Write the velocities to the trajectory file every
log_interval
steps. Since, the time-stamped force-bias Monte Carlo algorithm does not use velocities explicitly, zero velocities will be written. Ifconfiguration
is a trajectory (i.e. this is a restart calculation) this parameter will be the same value as it was in the previous trajectory. Default: Falsewrite_forces (bool) – Write the forces to the trajectory file every
log_interval
steps. Default: Truewrite_stresses (bool) – Write the stress to the trajectory file every
log_interval
steps. Default: Falsetrajectory_interval (int) – The resolution used in saving steps to a trajectory file. A value of 1 results in all steps being saved; a value of 2 results in every second step being saved; etc. If
configuration
is a trajectory (i.e. this is a restart MD simulation) this parameter will use the same value as was used in the previous trajectory. Default: The same value aslog_interval
.trajectory_object_id (str | None) – The object id of the trajectory written to
trajectory_filename
. If a value ofNone
is given, then an object id will be chosen automatically. Default: None.number_of_independent_runners (int) – The number of independent simulations that should be run. If greater than 1, each runner will run in an independent process group and at the end of each iteration, the candidates of all runners will be gathered, selected, and added to the training data. The simulation completes if all runners complete without exceeding the retraining threshold. Default: Length of
configuration
.log_filename_prefix (str | None) – The prefix used in log files containing details of the molecular dynamics simulation.
- Returns:
The final trajectory or, in case of multiple runners, a list containing the trajectories of all runners. In case the active learning simulation does not complete within the max. number of iterations, None is returned.
- Return type:
MDTrajectory
| list | None
- static supportedConfigurationTypes()¶
- Returns:
Supported configuration types for initial training data.
- Return type:
tupleColuz
- trainingSetTable()¶
- Returns:
A table containing the initial and additional training data for this active learning simulation.
- Return type:
- uniqueString()¶
Return a unique string representing the state of the object.
- useLinearizedCoefficientMatrix()¶
- Returns:
Whether the matrix used to calculate the extrapolation grade should include only the linear coefficients or additionally the linearized version of the non-linear coefficients.
- Return type:
bool
- static validatorType()¶
- Returns:
The validator type.
- Return type:
Validator
- writeAtomicErrorEstimates()¶
- Returns:
True if the per-atom error estimates should be written to the trajectory.
- Return type:
bool
Usage Examples¶
Run a molecular dynamics ActiveLearningSimulation for amorphous HfO2 using an initial training data
set that is read from file. The method runMolecularDynamics
takes the same arguments as the
normal MolecularDynamics function. The extrapolation process is done by query-by-committee method.
# Load the training data. This can be one or several TrainingSet, MomentTensorPotentialTraining or
# Trajectory objects, which contain energy, forces, stress calculated with the reference
# calculator.
initial_training_data = [nlread('HfO2_crystal_training.hdf5', TrainingSet)[0]]
# Use the predefined small MTP basis.
mtp_basis = PredefinedBasisSmall
# Optimize the non-linear coefficients on the energy only.
nl_parameters = NonLinearCoefficientsParameters(
perform_optimization=True,
energy_only=True,
)
fitting_parameters = MomentTensorPotentialFittingParameters(
basis_size=mtp_basis,
outer_cutoff_radii=4.5*Ang,
mtp_filename='HfO2_active_learning.mtp',
non_linear_coefficients_parameters=nl_parameters,
use_element_specific_coefficients=True,
)
active_learning = ActiveLearningSimulation(
fitting_parameters=fitting_parameters,
initial_training_data=initial_training_data,
mtp_study_filename='HfO2_mtp_study',
mtp_study_object_id='HfO2',
reference_calculator=reference_calculator,
candidate_threshold=1.0,
retrain_threshold=3.0,
check_interval=20,
max_forces_check=10.0*eV/Ang,
use_stress=True,
candidate_trajectory_filename='HfO2_am_active_learning_candidates.hdf5',
restart_simulation=True,
extrapolation_selection_parameters=ExtrapolationSelectionParameters(
extrapolation_grade_algorithm=QueryByCommitteeForces,
descriptor_cutoff=0.1,
),
)
# Set up a high-temperature MD at 3000 K.
initial_velocity = MaxwellBoltzmannDistribution(
temperature=3000.0*Kelvin,
remove_center_of_mass_momentum=True,
random_seed=None,
enforce_temperature=True,
)
method = Langevin(
time_step=1*femtoSecond,
reservoir_temperature=3000*Kelvin,
friction=0.01*femtoSecond**-1,
initial_velocity=initial_velocity,
)
constraints = [FixCenterOfMass()]
# Run the MD simulation through the active learning object.
md_trajectory = active_learning.runMolecularDynamics(
bulk_configuration,
constraints=constraints,
trajectory_filename='HfO2_am_active_learning_3000K.hdf5',
steps=100000,
log_interval=100,
method=method,
domain_decomposition_pattern=[1, 1, 1],
)
# Extract the additional training data that has been added during active learning as a TrainingSet
# object, and save it to a file.
additional_training_data = active_learning.additionalTrainingSet()
nlsave('HfO2_active_learning_additional_training_data.hdf5', additional_training_data)
active_learning_md_query_by_committee.py
Run a molecular dynamics ActiveLearningSimulation for amorphous HfO2 using an initial training data
set that is read from file. The method runMolecularDynamics
takes the same arguments as the
normal MolecularDynamics function. The extrapolation process is done with the maxvol
method.
# Load the training data. This can be one or several TrainingSet, MomentTensorPotentialTraining or
# Trajectory objects, which contain energy, forces, stress calculated with the reference
# calculator.
initial_training_data = [nlread('HfO2_crystal_training.hdf5', TrainingSet)[0]]
# Use 300 MTP basis functions.
n_basis = 300
# Optimize the non-linear coefficients on the energy only.
nl_parameters = NonLinearCoefficientsParameters(
perform_optimization=True,
energy_only=True,
)
fitting_parameters = MomentTensorPotentialFittingParameters(
basis_size=n_basis,
outer_cutoff_radii=4.5*Ang,
mtp_filename='HfO2_active_learning.mtp',
non_linear_coefficients_parameters=nl_parameters
)
active_learning = ActiveLearningSimulation(
fitting_parameters=fitting_parameters,
initial_training_data=initial_training_data,
mtp_study_filename='HfO2_mtp_study',
mtp_study_object_id='HfO2',
reference_calculator=reference_calculator,
candidate_threshold=1.0,
retrain_threshold=3.0,
check_interval=20,
max_forces_check=10.0*eV/Ang,
use_stress=True,
candidate_trajectory_filename='HfO2_am_active_learning_candidates.hdf5',
restart_simulation=True,
)
# Set up a high-temperature MD at 3000 K.
initial_velocity = MaxwellBoltzmannDistribution(
temperature=3000.0*Kelvin,
remove_center_of_mass_momentum=True,
random_seed=None,
enforce_temperature=True,
)
method = Langevin(
time_step=1*femtoSecond,
reservoir_temperature=3000*Kelvin,
friction=0.01*femtoSecond**-1,
initial_velocity=initial_velocity,
)
constraints = [FixCenterOfMass()]
# Run the MD simulation through the active learning object.
md_trajectory = active_learning.runMolecularDynamics(
bulk_configuration,
constraints=constraints,
trajectory_filename='HfO2_am_active_learning_3000K.hdf5',
steps=100000,
log_interval=100,
method=method,
domain_decomposition_pattern=[1, 1, 1],
)
The following example shows how to set up active learning simulations with multiple runners. In this type of simulation, the MD is run with multiple different initial configurations, which are trained to simultaneously.
active_learning = ActiveLearningSimulation(
fitting_parameters=fitting_parameters,
initial_training_data=initial_training_data,
mtp_study_filename='HfO2_mtp_study',
mtp_study_object_id='HfO2',
reference_calculator=reference_calculator,
candidate_threshold=1.0,
retrain_threshold=3.0,
check_interval=20,
max_forces_check=10.0*eV/Ang,
use_stress=True,
candidate_trajectory_filename='HfO2_am_active_learning_candidates.hdf5',
restart_simulation=True,
extrapolation_selection_parameters=ExtrapolationSelectionParameters(
extrapolation_grade_algorithm=QueryByCommitteeForces,
descriptor_cutoff=0.1,
),
)
# Set up a high-temperature MD at 3000 K.
initial_velocity = MaxwellBoltzmannDistribution(
temperature=3000.0*Kelvin,
)
method = Langevin(
time_step=1*femtoSecond,
reservoir_temperature=3000*Kelvin,
friction=0.01*femtoSecond**-1,
initial_velocity=initial_velocity,
)
constraints = [FixCenterOfMass()]
# Use 8 different initial configurations which are loaded from file and collected in a list. The
# trajectory filenames are also set up here, so that each MD trajectory is written to a different
# file.
number_of_initial_configurations = 8
initial_configuration_list = []
trajectory_filename_list = []
for i in range(number_of_initial_configurations):
configuration = nlread(f'initial_configuration_{i}.hdf5', BulkConfiguration)[0]
initial_configuration_list.append(configuration)
trajectory_filename_list.append(f'HfO2_am_active_learning_3000K_{i}.hdf5')
# Run the MD simulation through the active learning object.
md_trajectory = active_learning.runMolecularDynamics(
initial_configuration_list,
constraints=constraints,
trajectory_filename=trajectory_filename_list,
steps=100000,
log_interval=1000,
method=method,
domain_decomposition_pattern=[1, 1, 1],
)
# Extract the additional training data that has been added during active learning, as TrainingSet
# object.
additional_training_data = active_learning.additionalTrainingSet()
active_learning_md_multiple_runners.py
When starting an active learning simulation from scratch it is recommended to scan over randomly generated initial non-linear coefficients and select the best fit for further training.
# This function replaces the setup of NonLinearCoefficientsParameters and
# MomentTensorPotentialFittingParameters without the need of a loop.
fitting_parameters_list = scanOverNonLinearCoefficients(
number_of_initial_guesses=30,
basis_size=n_basis,
outer_cutoff_radii=4.5*Ang,
mtp_filename_suffix='MTP_fit.mtp',
use_element_specific_coefficients=True,
random_seed=42,
)
# Perform an initial MTP training using a list of fitting parameters.
mtp_training = MomentTensorPotentialTraining(
filename='pre_training.hdf5',
object_id='mtp_training',
training_sets=initial_training_data,
calculator=reference_calculator,
calculate_stress=True,
fitting_parameters_list=fitting_parameters_list,
train_test_split=0.9,
random_seed=13345,
log_filename_prefix='pre_training',
)
mtp_training.update()
# Determine the best fit.
best_fit_index = mtp_training.rankFits(
data_tags=None,
weights=[[1, 1, 1], [1, 1, 1]],
statistical_measure=R2Score
)[0][0]
# Get the parameters of the best fit which can be passed to the fitting_parameters keyword of
# ActiveLearningSimulation.
best_fitting_parameters = mtp_training.fittingParametersList()[best_fit_index]
# Set up the active ActiveLearningSimulation as usual.
active_learning = ActiveLearningSimulation(
fitting_parameters=best_fitting_parameters,
initial_training_data=initial_training_data,
mtp_study_filename='HfO2_mtp_study',
mtp_study_object_id='HfO2',
reference_calculator=reference_calculator,
candidate_threshold=1.0,
retrain_threshold=3.0,
check_interval=20,
max_forces_check=10.0*eV/Ang,
use_stress=True,
candidate_trajectory_filename='HfO2_am_active_learning_candidates.hdf5',
restart_simulation=True,
)
active_learning_md_with_scan.py
Run an ActiveLearningSimulation geometry optimization.
# Set up non-linear coefficients.
non_linear_coefficients_parameters = NonLinearCoefficientsParameters(
perform_optimization=True,
energy_only=True,
)
# Set up parameters to use in the MTP fitting.
fitting_parameters = MomentTensorPotentialFittingParameters(
basis_size=300,
outer_cutoff_radii=4.5*Angstrom,
mtp_filename='active_learning_mtp.mtp',
non_linear_coefficients_parameters=non_linear_coefficients_parameters,
)
active_learning = ActiveLearningSimulation(
fitting_parameters=fitting_parameters,
initial_training_data=initial_training_data,
mtp_study_filename='mtp_study',
mtp_study_object_id='mtp',
reference_calculator=reference_calculator,
candidate_threshold=0.1,
retrain_threshold=1.0,
check_interval=1,
use_stress=True,
candidate_trajectory_filename='active_learning_candidates.hdf5',
restart_simulation=True,
extrapolation_selection_parameters=ExtrapolationSelectionParameters(
extrapolation_grade_algorithm=QueryByCommitteeForces,
descriptor_cutoff=0.1,
),
)
constraints = [FixStrain(x=True, y=True, z=True)]
# Run the geometry optimization through the active learning object.
optimzation_configuration = active_learning.runOptimizeGeometry(
bulk_configuration,
max_steps=1000,
constraints=constraints,
trajectory_filename='active_learning_optimization_trajectory.hdf5',
optimize_cell=False,
trajectory_interval=1,
)
# Extract the additional training data that has been added during active learning, as TrainingSet
# object.
additional_training_data = active_learning.additionalTrainingSet()
active_learning_optimization.py
Run an ActiveLearningSimulation nudged elastic band optimization.
# Set up non-linear coefficients.
non_linear_coefficients_parameters = NonLinearCoefficientsParameters(
perform_optimization=True,
energy_only=True,
)
# Set up parameters to use in the MTP fitting.
fitting_parameters = MomentTensorPotentialFittingParameters(
basis_size=300,
outer_cutoff_radii=4.5*Angstrom,
mtp_filename='active_learning_mtp.mtp',
non_linear_coefficients_parameters=non_linear_coefficients_parameters,
use_element_specific_coefficients=True,
)
active_learning = ActiveLearningSimulation(
fitting_parameters=fitting_parameters,
initial_training_data=initial_training_data,
mtp_study_filename='mtp_study',
mtp_study_object_id='mtp',
reference_calculator=reference_calculator,
candidate_threshold=0.1,
retrain_threshold=1.0,
check_interval=1,
use_stress=True,
candidate_trajectory_filename='active_learning_candidates.hdf5',
restart_simulation=True,
)
# Run the NEB optimization through the active learning object.
optimzation_configuration = active_learning.runOptimizeNudgedElasticBand(
neb,
max_steps=1000,
trajectory_filename='active_learning_optimization_trajectory.hdf5',
optimize_cell=False,
trajectory_interval=1,
)
active_learning_optimization_neb.py
Run an active learning MD and extract the training set table for a final fit to test different MTP basis sizes.
# Run the MD simulation through the active learning object.
md_trajectory = active_learning.runMolecularDynamics(
bulk_configuration,
constraints=constraints,
trajectory_filename='HfO2_am_active_learning_3000K.hdf5',
steps=100000,
log_interval=100,
method=method,
domain_decomposition_pattern=[1, 1, 1],
)
# Extract the additional training data that has been added during active learning as a TrainingSet
# object, and save it to a file.
additional_training_data = active_learning.additionalTrainingSet()
nlsave('HfO2_active_learning_additional_training_data.hdf5', additional_training_data)
# Extract a table with the initial and additional training data that has been added
# during active learning. This table can be used as input to a final MTP fit.
training_set_table = active_learning.trainingSetTable()
# Test different basis sizes.
fitting_parameters_list = []
for mtp_basis in [PredefinedBasisSmall, 400, 800]:
# Optimize the non-linear coefficients on the energy only.
nl_parameters = NonLinearCoefficientsParameters(
perform_optimization=True,
energy_only=True,
)
fitting_parameters = MomentTensorPotentialFittingParameters(
basis_size=mtp_basis,
outer_cutoff_radii=4.5*Ang,
mtp_filename=f'HfO2_active_learning_{mtp_basis}.mtp', # RJL: Not technically active learning. Maybe rename.
non_linear_coefficients_parameters=nl_parameters,
use_element_specific_coefficients=True,
)
fitting_parameters_list.append(fitting_parameters)
# Perform an MTP training using the list of fitting parameters.
mtp_training = MomentTensorPotentialTraining(
filename='Final_MTP_training_basis_size_scan.hdf5',
object_id='mtp_training',
training_sets=training_set_table,
calculator=reference_calculator,
calculate_stress=True,
fitting_parameters_list=fitting_parameters_list,
train_test_split=0.9,
random_seed=13345,
log_filename_prefix='mtp_basis_size_scan',
)
mtp_training.update()
nlprint(mtp_training)
Notes¶
MTP Active Learning¶
The ActiveLearningSimulation class can be used to continuously extend the training data for a machine-learned Moment Tensor Potential (MTP) during a molecular dynamics simulation, as described in [1] and [2].
A typical active learning simulation is started from an initial training
data set, which can be generated using the
MomentTensorPotentialTraining framework. By calling a method such as
runMolecularDynamics
or runOptimizeGeometry
, an active learning simulation is
initiated. From the initial training data set a starting MTP is trained which
is used to run the MD simulation. During the simulation the extrapolation
grade is calculated at regular intervals for the current configuration, as
described in [1] and [2]. A value above
zero means that the current configuration extrapolates the potential, i.e. is outside the
space of configurations covered by the training data. If the extrapolation
grade exceeds the first threshold candidate_threshold
, a copy of the
configuration is stored as a candidate to add to the extended training data set.
If the extrapolation grade exceeds the retrain_threshold
, the
simulation is stopped. The most relevant configurations are selected from the
collected candidates using the maxvol criterion [1].
Energy, forces, and stress are calculated for these configurations using the reference calculator,
and added to the training data set. A new MTP is trained on this extended data set
and the simulation is started from the beginning with the new MTP.
This cycle is repeated until the simulation completes without extrapolating
significantly, i.e. without exceeding the retrain threshold, or the maximum
number of iterations as specified by max_iterations
is reached.
All additional configurations, which are added to the original training
data set during an active learning simulation, are stored with their calculated
reference energy, forces, and stress to a ConfigurationDataContainer object
which is saved to candidate_trajectory_filename
. The additional training data
can also be be accessed via the method additionalTrainingData()
. The method
additionalTrainingSet()
returns the additional training data as TrainingSet
object.
Furthermore, ActiveLearningSimulation provides a method
trainingSetTable()
, which returns a Table that contains both the initial
training set as well as the additional training set. This table can directly
be used as input to a MomentTensorPotentialTraining object to
perform a final MTP fitting run and optimize the MTP hyperparameters.
A D3 dispersion calculator can be added in order to account for dispersion during the MD
simulations of the active learning cycle. This is achieved by creating the
ActiveLearningSimulation object with a D3 dispersion calculator attached
via the d3_dispersion_calculator
keyword. This calculator is required to be a
TremoloXCalculator calculator wrapped around a given D3 dispersion potential.
For creating the calculator a D3 dispersion potential, e.g. DispersionD3Z(xc='PBE')
,
has to used for creating a TremoloXCalculator object - yielding
the overall input TremoloXCalculator(DispersionD3Z(xc='PBE'))
. The type of D3 dispersion
potential and the xc functional included in the TremoloXCalculator object
should be explicitly specified according to the specific modeling needs.
Query-by-committee¶
Query-by-committee is an alternative to the maxvol algorithm to calculate the extrapolation grade for multi-element systems. In this case, an ensemble of different MTP models is trained on the same training data (default is 6 different models), each with different initial guesses. The ensemble standard deviation of the forces prediction is used as a measure of extrapolation, which can be interpreted as a prediction of the forces error in the configuration [3]. The query-by-committee algorithm typically provides a numerically more stable way to calculate the extrapolation grade and does not require reduced MTP accuracy settings. In particular it can be used with any MTP basis size and it is therefore the recommended algorithm.
The query-by-committee method can be activated by setting
extrapolation_grade_algorithm=QueryByCommitteeForces
in ExtrapolationSelectionParameters.
Unlike maxvol, query-by-committee does not provide a native algorithm for selecting the most
different candidate configurations out of all extrapolating candidates.
Instead, a dedicated separate selection algorithms is used
(MTP-structural-descriptor).
The parameters for this method can be selected through the extrapolation_selection_parameters
, which
takes an ExtrapolationSelectionParameters object.
Multiple runners¶
Using multiple runners in active learning MD or optimization is an extension to the standard way of running active learning to increase the exploration of different new configurations. Here, the simulation is split up and run on multiple different initial configurations simultaneously, collecting candidates from all simulations. When all simulations have stopped due to exceeding the retraining threshold, all the candidates are gathered, selected, and collectively included in the training before the next iteration is started. This allows to include, e.g. various stoichiometric compositions, or interface representations in the same active learning simulation, instead of running multiple simulations sequentially.
Multiple runners can be set up by passing a list of initial configurations as
configuration
parameter in runMolecularDynamics()
or
runOptimizeGeometry()
. For running this type of simulation efficiently,
one should ideally allocate at least as many MPI processes as number of
initial configurations.
Practical guide¶
Although it can also be used in production simulations, the ActiveLearningSimulation class is primarily designed to efficiently extend existing training data via MD simulations without having to run computationally expensive ab-initio MD simulations for amorphous or high-temperature systems. Apart from the automatically added training data in the candidate trajectory file, it can also be useful to take snapshots from the final MD trajectory, recalculate energy, forces, and stress with the reference calculators (e.g. by passing it as TrainingSet to MomentTensorPotentialTraining) to obtain additional training configurations that can be added to a larger training data set.
For consistency with the input of MomentTensorPotentialTraining
it
is recommended to use a list or Table of TrainingSet
objects to
pass the initial training data.
When using the default maxvol algorithm, it is recommended to reduce the MTP accuracy settings, e.g. MTP basis, outer cutoff radius, and to include only the most relevant training data in the initial training data set to be run in a robust and efficient way. One can occasionally encounter the error message “No new candidates found in active learning”, often accompanied by very large extrapolation grade values. This is typically caused by an MTP basis set being too large for the given training data, which leads to numerical inaccuracies when calculating the extrapolation grade and selecting the candidates to add to the next training iteration. In this case it often helps to reduce the MTP basis size in the MomentTensorPotentialFittingParameters until the problem disappears. Alternatively, one can try to include more diverse configurations to the initial training data, for example by running DFT-MD.
Choosing the query-by-committee algorithm instead, can often avoid this problem from the start and provide a more stable simulation.
When starting an active learning simulation from scratch it is recommended to scan over
randomly generated initial non-linear coefficients and select the best fit for further training,
as shown in the example section.
If an active learning simulation is continued, then it is best to use the non-linear coefficients
from the previous run to keep he training consistent. This can be done by passing the MTP
filename from the previous run as initial_coefficients
parameter in the
NonLinearCoefficientsParameters.
In order to run an active learning MD simulation more efficiently one can increase the
check_interval
parameter, to reduce the frequency at which the
extrapolation grade is calculated. As a fallback the extrapolation grade is
always checked, whenever the largest force on an atom exceeds the value
max_forces_check
, which is typically a sign of extrapolation.
This is only supported for MD, whereas in optimization simulations the
extrapolation grade is always checked at every step.
Note, that the initial training data must have pre-calculated reference data
(e.g. DFT energy, forces, stress). That means e.g. TrainingSet
objects must be set up with recalculate_training_data=False
ActiveLearningSimulation does not support calculating the reference data for the
initial training data that is given.
# RJL This sentence seems to be cut off. The training data generated during an active learning run can be accessed via
Active Learning MD¶
A typical workflow for training via active learning MD could be to use displaced bulk crystal or interface configurations generated via RandomDisplacementsParameters or CrystalInterfaceTrainingParameters respectively, as initial training data. Simulating the crystal at high temperature using active learning MD to melt it can then be used to include liquid and amorphous configurations to the training data. Typical values for candidate and retrain threshold are 1.0 and 3.0, respectively. The larger these values are, the more extrapolation is allowed.
MD with active learning can be combined with hook functions. This can be used to include non-equilibrium simulations in the training.
Active Learning geometry optimization¶
This type of optimization can supplement molecular dynamics simulations by bridging e.g. amorphous
and crystalline phases. Since optimization trajectories tend to be shorter than MD trajectories,
and configurations can change more significantly between optimization steps, it is recommended to
set the check_interval
parameter to 1 and lower the candidate- and retrain thresholds to add
sufficiently many new structures to the training set. Note that restart_strategy
and hook
functions available in OptimizeGeometry are not supported
in ActiveLearningSimulation. This method is used when crystal structure prediction is run
with an ActiveLearningSimulation object (see also the CrystalStructurePrediction
reference manual).
Active Learning nudged elastic band optimization¶
This type of active learning optimization can be used to train potentials for reaction paths and
transition states. Similarly to OptimizeGeometry, hook functions and restart_strategy
arguments are not available when running OptimizeNudgedElasticBand through active
learning. Every image in the NudgedElasticBand is checked individually for its
extrapolation grade, potentially adding multiple candidate structures each optimization iteration.
However, it is still recommended to set the check_interval
parameter to 1. A preoptimization of
the endpoints can be performed. In that case the initial MTP should be able to perform the
optimization without unphysical results, as they are not part of the learning process. Note that
multiple independent runners are not supported.