ActiveLearningSimulation¶

class ActiveLearningSimulation(fitting_parameters, initial_training_data, mtp_study_filename, mtp_study_object_id, reference_calculator, correction_calculator=None, d3_dispersion_calculator=None, candidate_threshold=None, retrain_threshold=None, check_interval=None, limit_candidates=None, max_forces_check=None, use_stress=None, candidate_trajectory_filename=None, candidate_trajectory_object_id=None, max_iterations=None, processes_per_calculation=None, extrapolation_selection_parameters=None, use_linearized_coefficient_matrix=None, minimum_bond_length_percent=None, log_filename_prefix=None, restart_simulation=None)¶

Set up an object that can be used to run active learning simulations using the Moment Tensor Potential.

Parameters:

fitting_parameters (MomentTensorPotentialFittingParameters) – The parameters for MTP fitting.
initial_training_data (TrainingSet | Trajectory | ConfigurationDataContainer | MDTrajectory | MomentTensorPotentialTraining | sequence of [ TrainingSet | Trajectory | ConfigurationDataContainer | MDTrajectory | MomentTensorPotentialTraining ] | Table) – The initial training data. All configurations in the initial training data must have precalculated reference data (energy, forces, and possibly stress), otherwise the configuration is not included in the active learning training.
mtp_study_filename (str) – The file name of file that contains the MTP study object used for training the moment tensor potential.
mtp_study_object_id (str) – The object_id of the MTP study object used to train the moment tensor potential.
reference_calculator (Calculator) – The reference calculator.
correction_calculator (Calculator) – Calculator used to correct reference values. The MTP is fit to the reference calculator minus the contribution of the correction calculator.
d3_dispersion_calculator (TremoloXCalculator) – A D3 dispersion calculator that, when added, is used in the MD simulation during the active learning cycle as a correction to the MTP output.
Default: None
candidate_threshold (float) – The threshold above which configurations will be added as candidates for the next training iteration.
Default: 1
retrain_threshold (float) – The extrapolation threshold at which the simulation is stopped and trained again.
Default: 3
check_interval (int) – The interval at which the extrapolation grade is calculated and checked in MD simulations. In active learning optimization the extrapolation grade is always checked at every step.
Default: 10
limit_candidates (int) – Upper limit for the number of candidates collected each iteration that trigger retraining. This can be used to limit the number of reference calculator calls when many configurations fall between the candidate and retrain thresholds.
Default: No limit
max_forces_check (PhysicalQuantity of type energy / length) – If the max. force on an atom exceeds this value the extrapolation grade is always checked. Only available with MD active learning; has no effect in other cases.
Default: 10*eV / Angstrom
use_stress (bool) – Whether or not stress is used in training the MTP potential.
Default: True
candidate_trajectory_filename (str) – The filename for the trajectory to store the candidate configurations that have been added to the training set.
candidate_trajectory_object_id (str) – The object ID for the trajectory to store the candidate configurations that have been added to the training set.
max_iterations (int) – The max number of retraining iterations.
Default: 20
processes_per_calculation (int | None) – The number of processes used for each calculation with the reference calculator.
Default: All processes corresponding to None.
extrapolation_selection_parameters (ExtrapolationSelectionParameters) – The parameters that specify the details of the extrapolation grade calculation and candidate selection.
use_linearized_coefficient_matrix (bool) –
Deprecated: Use extrapolation_grad_algorithm=MaxvolLinearized in ExtrapolationSelectionParameters instead.
minimum_bond_length_percent (float) – The minimum percentage of bond length as fraction of the covalent radii that is allowed in a training configuration. If any bond is less than that value then the configuration will be discarded from the training data.
Default: 0.35
log_filename_prefix (str | LogToStdOut | None) – The prefix used in the log file names generated by the training data calculation and fitting parts of the simulation.
restart_simulation (bool) – Restart the simulation with any configurations added during previous active learning. simulations
Default: False

additionalTrainingData()¶

Returns:: The training dataset that has been added during the active learning simulation.
Return type:: Trajectory

additionalTrainingSet()¶

Returns:: The training dataset that has been added during the active learning simulation or None if no training data has been added.
Return type:: TrainingSet | None

candidateThreshold()¶

Returns:: The candidate threshold.
Return type:: float

candidateTrajectoryFilename()¶

Returns:: The filename to which the candidate trajectory is written.
Return type:: str

candidateTrajectoryObjectId()¶

Returns:: The object ID to which the candidate trajectory is written.
Return type:: str

checkInterval()¶

Returns:: The check interval for the extrapolation grade.
Return type:: int

committeeSize()¶

Returns:: The number of committee members when the query-by-committee method is used for the extrapolation grade.
Return type:: int

correctionCalculator()¶

Returns:: The correction calculator, if defined.
Return type:: Calculator

extrapolationGradeAlgorithm()¶

Returns:: Which extrapolation grade algorithm should be used.
Return type:: MaxvolStandard | MaxvolLinearized | MaxForce | QueryByCommitteeForces | QueryByCommitteeEnergy

fittingParameters()¶

Returns:: The MTP fitting parameters.
Return type:: MomentTensorPotentialFittingParameters

forcesCap()¶

Returns:: The forces cap.
Return type:: PhysicalQuantity of type energy / length.

initialTrainingData()¶

Returns:: The initial training dataset.
Return type:: Trajectory

initialTrainingSet()¶

Returns:: The initial training dataset for this active learning simulation.
Return type:: TrainingSet

limitCandidates()¶

Returns:: Upper limit for the number of candidates that are collected each iteration before retraining.
Return type:: int

logFilenamePrefix()¶

Returns:: The prefix used in MTP training log files. The flag LogToStdOut causes log output to be written to standard out.
Return type:: str

maxForcesCheck()¶

Returns:: The max. forces value at which a check of the extrapolation grade is enforced.
Return type:: PhysicalQuantity of type energy / length.

maxIterations()¶

Returns:: The maximum number of retraining iterations.
Return type:: int

minBondLengthPercent()¶

Returns:: The minimum bond length as fraction of the covalent radii.
Return type:: float

mtpParameters()¶

Returns:: Gets the MTPParameters containing the Optimized MTP parameters
Return type:: MTPParameters | None

processesPerCalculation()¶

Returns:: The number of processes that is used for a configuration when calculating the reference data.
Return type:: int

referenceCalculator()¶

Returns:: The reference calculator
Return type:: Calculator

restartSimulation()¶

Returns:: Whether or not the simulation and training are restarted from a previous run.
Return type:: bool

retrainThreshold()¶

Returns:: The retrain threshold.
Return type:: float

runMolecularDynamics(configuration, constraints=None, trajectory_filename=None, steps=None, log_interval=None, method=None, xyz_filename=None, hook_functions=None, pre_step_hook=None, post_step_hook=None, write_velocities=None, write_forces=True, write_stresses=None, domain_decomposition_pattern=None, trajectory_interval=None, measurement_hook=None, trajectory_object_id=None, number_of_independent_runners=None, log_filename_prefix=None)¶

Run an active learning MD simulation.

Parameters:

configuration (BulkConfiguration | sequence of type BulkConfiguration) – The initial configuration. When using multiple independent runners this can be given as a list with a different item for each runner.
constraints (list of ints | list of BaseConstraint) – The list of atomic indices, denoting fixed atoms, or constraint objects, such as RigidBody.
Default: [].
trajectory_filename (str | sequence of str | None) – The filename of the file to be used for storing the trajectory, or None if no trajectory should be written. A trajectory filename should not be given if configuration is a trajectory. When using multiple independent runners this can be given as a list with a different item for each runner.
Default: None.
steps (int) – The number of time-steps to take in the simulation.
Default: 50.
log_interval (int) – The interval at which information, such as time, energy, temperature, etc. is written to the log output.
Default: 1.
method (BaseMDmethod) – The MD method used for the simulation. When using multiple independent runners this can be given as a list with a different item for each runner.
Default: NVEVelocityVerlet.
xyz_filename (str) – The name of the file to be used for storing the xyz trajectory, or None if no xyz-trajectory should be written.
Default: None.
hook_functions (None) – Currently not supported in active learning.
pre_step_hook (function | list of functions | None) – An optional user-defined function or a list of functions which will be called just before the forces evaluation. The signature of the function requires the arguments ( step, time, configuration). The return status is ignored. Unhandled exceptions will terminate the evaluation. If a list is given the functions will be called in the given order.
Default: None.
post_step_hook (function | list of functions | None) – An optional user-defined function or a list of functions which will be called just after the forces evaluation. The signature of the function requires the arguments (step, time, configuration). Optional arguments include, forces, local_forces (for distributed MD), stress, trajectory (the MD trajectory), temperature, pressure, potential_energy, and/or kinetic_energy. The return status is ignored. Unhandled exceptions will terminate the evaluation. If a list is given the functions will be called in the given order.
Default: None.
write_velocities (bool) – Write the velocities to the trajectory file every log_interval steps. If configuration is a trajectory (i.e. this is a restart calculation) this parameter will be the same value as it was in the previous trajectory.
Default: True.
write_forces (bool) – Write the forces to the trajectory file every log_interval steps. If configuration is a trajectory (i.e. this is a restart calculation) this parameter will be the same value as it was in the previous trajectory.
Default: True.
write_stresses (bool) – Write the stress to the trajectory file every log_interval steps. A value of None means that write_stresses is True by default for NPT methods and False otherwise. This is to avoid the additional work of calculating the stress when it is not needed. If configuration is a trajectory (i.e. this is a restart calculation) this parameter will be the same value as it was in the previous trajectory.
Default: None.
domain_decomposition_pattern (list of type int | Automatic | None) – The pattern how the domains should be arranged in a parallel simulations. E.g. [1, 2, 4] means 1 domain in A-, 2 in B-, and 4 in C-direction. If Automatic domain decomposition is used, then the simulation cell will be divided into domains whose edges are as close together in length as possible. If None is given then domain decomposition will be disabled.
Default: Automatic.
trajectory_interval (int) – The resolution used in saving steps to a trajectory file. A value of 1 results in all steps being saved; a value of 2 results in every second step being saved; etc. If configuration is a trajectory (i.e. this is a restart simulation) this parameter will use the same value as was used in the previous trajectory.
Default: The same value as log_interval.
measurement_hook (function | list of functions | None) – An optional user-defined function or a list of functions which will be called at the end of each step after all constraints have been applied. The signature of the function requires the arguments (step, time, configuration). Optional arguments include, forces, local_forces (for distributed MD), stress, trajectory (the MD trajectory), temperature, pressure, potential_energy, and/or kinetic_energy. The return value should be a dictionary that maps string keys to values. The values may be numbers, numpy arrays, or PhysicalQuantities. These values are stored on the MDTrajectory and may be accessed using the measurement method. Unhandled exceptions will terminate the evaluation. If a list is given the functions will be called in the given order.
Default: None.
trajectory_object_id (str | None) – The object id of the trajectory written to trajectory_filename. If a value of None is given, then an object id will be chosen automatically.
Default: None.
number_of_independent_runners (int) – The number of independent simulations that should be run. If greater than 1, each runner will run in an independent process group and at the end of each iteration, the candidates of all runners will be gathered, selected, and added to the training data. The simulation completes if all runners complete without exceeding the retraining threshold.
Default: Length of configuration.
log_filename_prefix (str | None) – The prefix used in log files containing details of the molecular dynamics simulation.

Returns:

The final trajectory or, in case of multiple runners, a list containing the trajectories of all runners. In case the active learning simulation does not complete within the max. number of iterations, None is returned.

Return type:

MDTrajectory | list | None

runOptimizeGeometry(configuration, max_forces=None, max_stress=None, max_steps=None, max_step_length=None, constraints=None, trajectory_filename=None, trajectory_object_id=None, optimize_cell=None, disable_stress=None, optimizer_method=None, target_stress=None, constrain_bravais_lattice=None, trajectory_interval=None, remove_drift=None, enable_optimization_stop_file=None, restart_strategy=<class 'NL.ComputerScienceUtilities.NLFlag._NLFlag.NoRestart'>, number_of_independent_runners=None, log_filename_prefix=None, continue_on_non_convergence=None)¶

Run an active learning geometry optimization.

Parameters:

configuration (BulkConfiguration | sequence of type BulkConfiguration) – The configuration to be optimized. When using multiple independent runners this can be given as a list with a different item for each runner.
max_forces (PhysicalQuantity of type force) – The convergence criterion for the atomic forces.
Default: 0.05*eV/Angstrom.
max_stress (PhysicalQuantity of type pressure) – The convergence criterion for the maximum difference between the internal stress and the target stress.
Default: 0.1*GPa.
max_steps (int) – The maximum number of optimization steps.
Default: 200.
max_step_length (PhysicalQuantity of type length) – The maximum step length the optimizer may take.
Default: 0.2*Ang.
constraints (list of integers and Constraints objects) – A list of indices of the atom with fixed positions and Constraints objects.
Default: [].
trajectory_filename (str | sequence of str | None) – The filename used to store the trajectory. If the value is None then no trajectory file will be written. When using multiple independent runners this can be given as a list with a different item for each runner.
Default: None.
trajectory_object_id (str | None) – The object id of the trajectory written to trajectory_filename. If a value of None is given, then an object id will be chosen automatically.
Default: None.
optimize_cell (bool) – The lattice vectors for bulk configuration will change during the optimization. Enabling the stress calculation for bulk configurations.
Default: True for BulkConfigurations otherwise False.
disable_stress (bool) –
Deprecated: from v2022.03, use optimize_cell parameter instead.
optimizer_method (FIRE | LBFGS) – The optimization algorithm to use.
Default: LBFGS.
target_stress (PhysicalQuantity of type pressure|PHYSICALQUANTITY| of type pressure) – The target internal stress (tensor) of the system. Can be given as a single value in case of isotropic pressure, or as an internal stress vector in Voigt notation or as a 3x3-matrix.
Default: 0*GPa.
constrain_bravais_lattice (bool) – Enable preserving the Bravais lattice symmetry of the configuration.
Default: True if the target_stress is commensurate with the lattice symmetries.
trajectory_interval (int | PhysicalQuantity of type time) – The resolution used in saving steps to a trajectory file. This can either be given as an integer (a value of 1 results in all steps being saved; a value of 2 results in every second step being saved; etc.) or as a time interval.
Default: 1.
remove_drift (bool) – In ab-initio calculations, the sum of the forces along each Cartesian direction does not necessarily sum to zero due to numerical inaccuracies. This option controls if the “drift” in the forces should be removed by subtracting the average force along each Cartesian direction from all atoms.
Default: True.
enable_optimization_stop_file (bool) – Determines whether to enable a file for stopping the geometry optimization. If True, creation of the stop file will stop the optimization at the next step. The name of the stop file will be shown in the log output; it will be stop-geometry-optimization-uniqueID, where uniqueID is a randomly generated identifier for this optimization. The file must be created in the current working directory.
Default: True.
restart_strategy (NoRestart) – The restart mechanism has to be set to NoRestart for ActiveLearningSimulation
Default: NoRestart.
number_of_independent_runners (int) – The number of independent geometry optimizations that should be run. If greater than 1, each runner will run in an independent process group and at the end of each iteration, the candidates of all runners will be gathered, selected, and added to the training data. The simulation completes if all runners complete without exceeding the retraining threshold.
Default: Length of configuration.
log_filename_prefix (str | None) – The prefix used in log files containing details of the geometry optimization.
continue_on_non_convergence (bool) – Continue with the converged configurations when using multiple runners and only some optimizations converge in the given number of active learning cycles. If False is given the calculation will raise an exception if any optimization does not converge.
Default: False

runOptimizeNudgedElasticBand(neb, max_forces=None, max_stress=None, max_steps=None, max_step_length=None, constraints=None, trajectory_filename=None, spring_constant=None, climbing_image=None, preoptimization=None, optimizer_method=None, log_filename_prefix=<class 'NL.ComputerScienceUtilities.NLFlag._NLFlag.Automatic'>, optimize_cell=None, target_stress=None, trajectory_interval=None, restart_strategy=<class 'NL.ComputerScienceUtilities.NLFlag._NLFlag.NoRestart'>)¶

Run an active learning nudged elastic band optimization.

Parameters:

neb (NudgedElasticBand) – The nudged elastic band configuration to optimize.
max_forces (PhysicalQuantity of type force) – The convergence criterion for the atomic forces.
Default: 0.05*eV/Angstrom.
max_stress (PhysicalQuantity of type pressure) – The convergence criterion for the maximum difference between the internal stress and the target stress.
Default: 0.1*GPa.
max_steps (int) – The maximum number of optimization steps.
Default: 200.
max_step_length (PhysicalQuantity of type length) – The maximum step length the optimizer may take.
Default: 0.2*Ang.
constraints (list of ints) – List of atom indices that are kept fixed during optimization.
Default: [].
trajectory_filename (str | sequence of str | None) – The filename used to store the trajectory. If the value is None then no trajectory file will be written.
Default: None.
spring_constant (PhysicalQuantity of type eV/Ang**2) – The spring constant used for the NEB relaxation.
Default: 5.0*(eV/Ang**2).
climbing_image (bool) – Flag indicating if the climbing image algorithm should be used to find a transition state.
Default: False.
preoptimization (bool) – Flag indicating if the endpoints should be optimized before the NEB optimization.
Default: False.
optimizer_method (FIRE | LBFGS) – The optimizer to use for optimizing the structure.
Default: LBFGS.
log_filename_prefix (Automatic | str | None) – The logging output from each image will be written to filenames starting with this value. If it is set to Automatic then the prefix will be the name of the calling python script. If it is set to None, then all output will be written to stdout.
Default: Automatic.
optimize_cell (bool) – If it is None, it will be set automatically according to the NEB configuration. If the NEB configuration is composed of configurations of the type BulkConfiguration and the lattice vectors of the configurations are different, it is set to True; otherwise it is False.
Default: None.
target_stress (PhysicalQuantity of type pressure) – The target internal stress (tensor) of the system. Can be given as a single value in case of isotropic pressure, or as an internal stress vector in Voigt notation or as a 3x3-matrix.
Default: 0*GPa.
trajectory_interval (int | PhysicalQuantity of type time) – The resolution used in saving steps to a trajectory file. This can either be given as an integer (a value of 1 results in all steps being saved; a value of 2 results in every second step being saved; etc.) or as a time interval.
Default: 1.
restart_strategy (NoRestart) – The restart mechanism has to be set to NoRestart for ActiveLearningSimulation
Default: NoRestart.

Returns:

The optimized NEB. In case the active learning MD simulation does not complete within the max. number of iterations, None is returned.

Return type:

NudgedElasticBand | None

runTimeStampedForceBiasMonteCarlo(configuration, constraints=None, trajectory_filename=None, steps=None, log_interval=None, method=None, hook_functions=None, pre_step_hook=None, post_step_hook=None, write_velocities=None, write_forces=None, write_stresses=None, trajectory_interval=None, trajectory_object_id=None, number_of_independent_runners=None, log_filename_prefix=None)¶

Run an active learning TFMC simulation.

Parameters:

configuration (BulkConfiguration | sequence of type BulkConfiguration) – The initial configuration. When using multiple independent runners this can be given as a list with a different item for each runner.
constraints (list of ints | list of BaseConstraint) – The list of atomic indices, denoting fixed atoms, or constraint objects, such as RigidBody.
Default: [].
trajectory_filename (str | sequence of str | None) – The filename of the file to be used for storing the trajectory, or None if no trajectory should be written. A trajectory filename should not be given if configuration is a trajectory. When using multiple independent runners this can be given as a list with a different item for each runner.
Default: None.
steps (int) – The number of time-steps to take in the simulation.
Default: 50.
log_interval (int) – The interval at which information, such as time, energy, temperature, etc. is written to the log output.
Default: 1.
method (ForceBiasMontCarlo | ForceBiasMonteCarloNPTBerendsen) – The Monte Carlo method used for the simulation. When using multiple independent runners this can be given as a list with a different item for each runner.
Default: ForceBiasMonteCarlo
hook_functions (None) – Currently not supported in active learning.
pre_step_hook (function | list of functions | None) – An optional user-defined function or a list of functions which will be called just before the forces evaluation. The signature of the function requires the arguments ( step, time, configuration). The return status is ignored. Unhandled exceptions will terminate the evaluation. If a list is given the functions will be called in the given order.
Default: None.
post_step_hook (function | list of functions | None) – An optional user-defined function or a list of functions which will be called just after the forces evaluation. The signature of the function requires the arguments (step, time, configuration, forces, stress). The return status is ignored. Unhandled exceptions will terminate the simulation. If a list is given the functions will be called in the given order.
Default: None
write_velocities (bool) – Write the velocities to the trajectory file every log_interval steps. Since, the time-stamped force-bias Monte Carlo algorithm does not use velocities explicitly, zero velocities will be written. If configuration is a trajectory (i.e. this is a restart calculation) this parameter will be the same value as it was in the previous trajectory.
Default: False
write_forces (bool) – Write the forces to the trajectory file every log_interval steps.
Default: True
write_stresses (bool) – Write the stress to the trajectory file every log_interval steps.
Default: False
trajectory_interval (int) – The resolution used in saving steps to a trajectory file. A value of 1 results in all steps being saved; a value of 2 results in every second step being saved; etc. If configuration is a trajectory (i.e. this is a restart MD simulation) this parameter will use the same value as was used in the previous trajectory.
Default: The same value as log_interval.
trajectory_object_id (str | None) – The object id of the trajectory written to trajectory_filename. If a value of None is given, then an object id will be chosen automatically.
Default: None.
number_of_independent_runners (int) – The number of independent simulations that should be run. If greater than 1, each runner will run in an independent process group and at the end of each iteration, the candidates of all runners will be gathered, selected, and added to the training data. The simulation completes if all runners complete without exceeding the retraining threshold.
Default: Length of configuration.
log_filename_prefix (str | None) – The prefix used in log files containing details of the molecular dynamics simulation.

Returns:

The final trajectory or, in case of multiple runners, a list containing the trajectories of all runners. In case the active learning simulation does not complete within the max. number of iterations, None is returned.

Return type:

MDTrajectory | list | None

static supportedConfigurationTypes()¶

Returns:: Supported configuration types for initial training data.
Return type:: tupleColuz

trainingSetTable()¶

Returns:: A table containing the initial and additional training data for this active learning simulation.
Return type:: Table

uniqueString()¶: Return a unique string representing the state of the object.

useLinearizedCoefficientMatrix()¶

Returns:: Whether the matrix used to calculate the extrapolation grade should include only the linear coefficients or additionally the linearized version of the non-linear coefficients.
Return type:: bool

static validatorType()¶

Returns:: The validator type.
Return type:: Validator

writeAtomicErrorEstimates()¶

Returns:: True if the per-atom error estimates should be written to the trajectory.
Return type:: bool

Usage Examples¶

Run a molecular dynamics ActiveLearningSimulation for amorphous HfO2 using an initial training data set that is read from file. The method runMolecularDynamics takes the same arguments as the normal MolecularDynamics function. The extrapolation process is done by query-by-committee method.

# Load the training data. This can be one or several TrainingSet, MomentTensorPotentialTraining or
# Trajectory objects, which contain energy, forces, stress calculated with the reference
# calculator.
initial_training_data = [nlread('HfO2_crystal_training.hdf5', TrainingSet)[0]]

# Use the predefined small MTP basis.
mtp_basis = PredefinedBasisSmall

# Optimize the non-linear coefficients on the energy only.
nl_parameters = NonLinearCoefficientsParameters(
    perform_optimization=True,
    energy_only=True,
)

fitting_parameters = MomentTensorPotentialFittingParameters(
    basis_size=mtp_basis,
    outer_cutoff_radii=4.5*Ang,
    mtp_filename='HfO2_active_learning.mtp',
    non_linear_coefficients_parameters=nl_parameters,
    use_element_specific_coefficients=True,
)

active_learning = ActiveLearningSimulation(
    fitting_parameters=fitting_parameters,
    initial_training_data=initial_training_data,
    mtp_study_filename='HfO2_mtp_study',
    mtp_study_object_id='HfO2',
    reference_calculator=reference_calculator,
    candidate_threshold=1.0,
    retrain_threshold=3.0,
    check_interval=20,
    max_forces_check=10.0*eV/Ang,
    use_stress=True,
    candidate_trajectory_filename='HfO2_am_active_learning_candidates.hdf5',
    restart_simulation=True,
    extrapolation_selection_parameters=ExtrapolationSelectionParameters(
        extrapolation_grade_algorithm=QueryByCommitteeForces,
        descriptor_cutoff=0.1,
    ),
)

# Set up a high-temperature MD at 3000 K.
initial_velocity = MaxwellBoltzmannDistribution(
    temperature=3000.0*Kelvin,
    remove_center_of_mass_momentum=True,
    random_seed=None,
    enforce_temperature=True,
)

method = Langevin(
    time_step=1*femtoSecond,
    reservoir_temperature=3000*Kelvin,
    friction=0.01*femtoSecond**-1,
    initial_velocity=initial_velocity,
)

constraints = [FixCenterOfMass()]

# Run the MD simulation through the active learning object.
md_trajectory = active_learning.runMolecularDynamics(
    bulk_configuration,
    constraints=constraints,
    trajectory_filename='HfO2_am_active_learning_3000K.hdf5',
    steps=100000,
    log_interval=100,
    method=method,
    domain_decomposition_pattern=[1, 1, 1],
)

# Extract the additional training data that has been added during active learning as a TrainingSet
# object, and save it to a file.
additional_training_data = active_learning.additionalTrainingSet()
nlsave('HfO2_active_learning_additional_training_data.hdf5', additional_training_data)

active_learning_md_query_by_committee.py

Run a molecular dynamics ActiveLearningSimulation for amorphous HfO2 using an initial training data set that is read from file. The method runMolecularDynamics takes the same arguments as the normal MolecularDynamics function. The extrapolation process is done with the maxvol method.

# Load the training data. This can be one or several TrainingSet, MomentTensorPotentialTraining or
# Trajectory objects, which contain energy, forces, stress calculated with the reference
# calculator.
initial_training_data = [nlread('HfO2_crystal_training.hdf5', TrainingSet)[0]]

# Use 300 MTP basis functions.
n_basis = 300

# Optimize the non-linear coefficients on the energy only.
nl_parameters = NonLinearCoefficientsParameters(
    perform_optimization=True,
    energy_only=True,
)

fitting_parameters = MomentTensorPotentialFittingParameters(
    basis_size=n_basis,
    outer_cutoff_radii=4.5*Ang,
    mtp_filename='HfO2_active_learning.mtp',
    non_linear_coefficients_parameters=nl_parameters
)

active_learning = ActiveLearningSimulation(
    fitting_parameters=fitting_parameters,
    initial_training_data=initial_training_data,
    mtp_study_filename='HfO2_mtp_study',
    mtp_study_object_id='HfO2',
    reference_calculator=reference_calculator,
    candidate_threshold=1.0,
    retrain_threshold=3.0,
    check_interval=20,
    max_forces_check=10.0*eV/Ang,
    use_stress=True,
    candidate_trajectory_filename='HfO2_am_active_learning_candidates.hdf5',
    restart_simulation=True,
)

# Set up a high-temperature MD at 3000 K.
initial_velocity = MaxwellBoltzmannDistribution(
    temperature=3000.0*Kelvin,
    remove_center_of_mass_momentum=True,
    random_seed=None,
    enforce_temperature=True,
)

method = Langevin(
    time_step=1*femtoSecond,
    reservoir_temperature=3000*Kelvin,
    friction=0.01*femtoSecond**-1,
    initial_velocity=initial_velocity,
)

constraints = [FixCenterOfMass()]

# Run the MD simulation through the active learning object.
md_trajectory = active_learning.runMolecularDynamics(
    bulk_configuration,
    constraints=constraints,
    trajectory_filename='HfO2_am_active_learning_3000K.hdf5',
    steps=100000,
    log_interval=100,
    method=method,
    domain_decomposition_pattern=[1, 1, 1],
)

active_learning_md.py

The following example shows how to set up active learning simulations with multiple runners. In this type of simulation, the MD is run with multiple different initial configurations, which are trained to simultaneously.

active_learning = ActiveLearningSimulation(
    fitting_parameters=fitting_parameters,
    initial_training_data=initial_training_data,
    mtp_study_filename='HfO2_mtp_study',
    mtp_study_object_id='HfO2',
    reference_calculator=reference_calculator,
    candidate_threshold=1.0,
    retrain_threshold=3.0,
    check_interval=20,
    max_forces_check=10.0*eV/Ang,
    use_stress=True,
    candidate_trajectory_filename='HfO2_am_active_learning_candidates.hdf5',
    restart_simulation=True,
    extrapolation_selection_parameters=ExtrapolationSelectionParameters(
        extrapolation_grade_algorithm=QueryByCommitteeForces,
        descriptor_cutoff=0.1,
    ),
)

# Set up a high-temperature MD at 3000 K.
initial_velocity = MaxwellBoltzmannDistribution(
    temperature=3000.0*Kelvin,
)

method = Langevin(
    time_step=1*femtoSecond,
    reservoir_temperature=3000*Kelvin,
    friction=0.01*femtoSecond**-1,
    initial_velocity=initial_velocity,
)

constraints = [FixCenterOfMass()]

# Use 8 different initial configurations which are loaded from file and collected in a list. The
# trajectory filenames are also set up here, so that each MD trajectory is written to a different
# file.
number_of_initial_configurations = 8
initial_configuration_list = []
trajectory_filename_list = []
for i in range(number_of_initial_configurations):
    configuration = nlread(f'initial_configuration_{i}.hdf5', BulkConfiguration)[0]
    initial_configuration_list.append(configuration)
    trajectory_filename_list.append(f'HfO2_am_active_learning_3000K_{i}.hdf5')

# Run the MD simulation through the active learning object.
md_trajectory = active_learning.runMolecularDynamics(
    initial_configuration_list,
    constraints=constraints,
    trajectory_filename=trajectory_filename_list,
    steps=100000,
    log_interval=1000,
    method=method,
    domain_decomposition_pattern=[1, 1, 1],
)

# Extract the additional training data that has been added during active learning, as TrainingSet
# object.
additional_training_data = active_learning.additionalTrainingSet()

active_learning_md_multiple_runners.py

When starting an active learning simulation from scratch it is recommended to scan over randomly generated initial non-linear coefficients and select the best fit for further training.

# This function replaces the setup of NonLinearCoefficientsParameters and
# MomentTensorPotentialFittingParameters without the need of a loop.
fitting_parameters_list = scanOverNonLinearCoefficients(
    number_of_initial_guesses=30,
    basis_size=n_basis,
    outer_cutoff_radii=4.5*Ang,
    mtp_filename_suffix='MTP_fit.mtp',
    use_element_specific_coefficients=True,
    random_seed=42,
)

# Perform an initial MTP training using a list of fitting parameters.
mtp_training = MomentTensorPotentialTraining(
    filename='pre_training.hdf5',
    object_id='mtp_training',
    training_sets=initial_training_data,
    calculator=reference_calculator,
    calculate_stress=True,
    fitting_parameters_list=fitting_parameters_list,
    train_test_split=0.9,
    random_seed=13345,
    log_filename_prefix='pre_training',
)
mtp_training.update()

# Determine the best fit.
best_fit_index = mtp_training.rankFits(
    data_tags=None,
    weights=[[1, 1, 1], [1, 1, 1]],
    statistical_measure=R2Score
)[0][0]

# Get the parameters of the best fit which can be passed to the fitting_parameters keyword of
# ActiveLearningSimulation.
best_fitting_parameters = mtp_training.fittingParametersList()[best_fit_index]

# Set up the active ActiveLearningSimulation as usual.
active_learning = ActiveLearningSimulation(
    fitting_parameters=best_fitting_parameters,
    initial_training_data=initial_training_data,
    mtp_study_filename='HfO2_mtp_study',
    mtp_study_object_id='HfO2',
    reference_calculator=reference_calculator,
    candidate_threshold=1.0,
    retrain_threshold=3.0,
    check_interval=20,
    max_forces_check=10.0*eV/Ang,
    use_stress=True,
    candidate_trajectory_filename='HfO2_am_active_learning_candidates.hdf5',
    restart_simulation=True,
)

active_learning_md_with_scan.py

Run an ActiveLearningSimulation geometry optimization.

# Set up non-linear coefficients.
non_linear_coefficients_parameters = NonLinearCoefficientsParameters(
    perform_optimization=True,
    energy_only=True,
)

# Set up parameters to use in the MTP fitting.
fitting_parameters = MomentTensorPotentialFittingParameters(
    basis_size=300,
    outer_cutoff_radii=4.5*Angstrom,
    mtp_filename='active_learning_mtp.mtp',
    non_linear_coefficients_parameters=non_linear_coefficients_parameters,
)

active_learning = ActiveLearningSimulation(
    fitting_parameters=fitting_parameters,
    initial_training_data=initial_training_data,
    mtp_study_filename='mtp_study',
    mtp_study_object_id='mtp',
    reference_calculator=reference_calculator,
    candidate_threshold=0.1,
    retrain_threshold=1.0,
    check_interval=1,
    use_stress=True,
    candidate_trajectory_filename='active_learning_candidates.hdf5',
    restart_simulation=True,
    extrapolation_selection_parameters=ExtrapolationSelectionParameters(
        extrapolation_grade_algorithm=QueryByCommitteeForces,
        descriptor_cutoff=0.1,
    ),
)

constraints = [FixStrain(x=True, y=True, z=True)]

# Run the geometry optimization through the active learning object.
optimzation_configuration = active_learning.runOptimizeGeometry(
    bulk_configuration,
    max_steps=1000,
    constraints=constraints,
    trajectory_filename='active_learning_optimization_trajectory.hdf5',
    optimize_cell=False,
    trajectory_interval=1,
)

# Extract the additional training data that has been added during active learning, as TrainingSet
# object.
additional_training_data = active_learning.additionalTrainingSet()

active_learning_optimization.py

Run an ActiveLearningSimulation nudged elastic band optimization.

# Set up non-linear coefficients.
non_linear_coefficients_parameters = NonLinearCoefficientsParameters(
    perform_optimization=True,
    energy_only=True,
)

# Set up parameters to use in the MTP fitting.
fitting_parameters = MomentTensorPotentialFittingParameters(
    basis_size=300,
    outer_cutoff_radii=4.5*Angstrom,
    mtp_filename='active_learning_mtp.mtp',
    non_linear_coefficients_parameters=non_linear_coefficients_parameters,
    use_element_specific_coefficients=True,
)

active_learning = ActiveLearningSimulation(
    fitting_parameters=fitting_parameters,
    initial_training_data=initial_training_data,
    mtp_study_filename='mtp_study',
    mtp_study_object_id='mtp',
    reference_calculator=reference_calculator,
    candidate_threshold=0.1,
    retrain_threshold=1.0,
    check_interval=1,
    use_stress=True,
    candidate_trajectory_filename='active_learning_candidates.hdf5',
    restart_simulation=True,
)

# Run the NEB optimization through the active learning object.
optimzation_configuration = active_learning.runOptimizeNudgedElasticBand(
    neb,
    max_steps=1000,
    trajectory_filename='active_learning_optimization_trajectory.hdf5',
    optimize_cell=False,
    trajectory_interval=1,
)

active_learning_optimization_neb.py

Run an active learning MD and extract the training set table for a final fit to test different MTP basis sizes.

# Run the MD simulation through the active learning object.
md_trajectory = active_learning.runMolecularDynamics(
    bulk_configuration,
    constraints=constraints,
    trajectory_filename='HfO2_am_active_learning_3000K.hdf5',
    steps=100000,
    log_interval=100,
    method=method,
    domain_decomposition_pattern=[1, 1, 1],
)

# Extract the additional training data that has been added during active learning as a TrainingSet
# object, and save it to a file.
additional_training_data = active_learning.additionalTrainingSet()
nlsave('HfO2_active_learning_additional_training_data.hdf5', additional_training_data)

# Extract a table with the initial and additional training data that has been added
# during active learning. This table can be used as input to a final MTP fit.
training_set_table = active_learning.trainingSetTable()

# Test different basis sizes.
fitting_parameters_list = []
for mtp_basis in [PredefinedBasisSmall, 400, 800]:
    # Optimize the non-linear coefficients on the energy only.
    nl_parameters = NonLinearCoefficientsParameters(
        perform_optimization=True,
        energy_only=True,
    )

    fitting_parameters = MomentTensorPotentialFittingParameters(
        basis_size=mtp_basis,
        outer_cutoff_radii=4.5*Ang,
        mtp_filename=f'HfO2_active_learning_{mtp_basis}.mtp',  # RJL: Not technically active learning. Maybe rename.
        non_linear_coefficients_parameters=nl_parameters,
        use_element_specific_coefficients=True,
    )

    fitting_parameters_list.append(fitting_parameters)

# Perform an MTP training using the list of fitting parameters.
mtp_training = MomentTensorPotentialTraining(
    filename='Final_MTP_training_basis_size_scan.hdf5',
    object_id='mtp_training',
    training_sets=training_set_table,
    calculator=reference_calculator,
    calculate_stress=True,
    fitting_parameters_list=fitting_parameters_list,
    train_test_split=0.9,
    random_seed=13345,
    log_filename_prefix='mtp_basis_size_scan',
)
mtp_training.update()
nlprint(mtp_training)

active_learning_md_query_by_committee.py

Notes¶

MTP Active Learning¶

The ActiveLearningSimulation class can be used to continuously extend the training data for a machine-learned Moment Tensor Potential (MTP) during a molecular dynamics simulation, as described in [1] and [2].

A typical active learning simulation is started from an initial training data set, which can be generated using the MomentTensorPotentialTraining framework. By calling a method such as runMolecularDynamics or runOptimizeGeometry, an active learning simulation is initiated. From the initial training data set a starting MTP is trained which is used to run the MD simulation. During the simulation the extrapolation grade is calculated at regular intervals for the current configuration, as described in [1] and [2]. A value above zero means that the current configuration extrapolates the potential, i.e. is outside the space of configurations covered by the training data. If the extrapolation grade exceeds the first threshold candidate_threshold, a copy of the configuration is stored as a candidate to add to the extended training data set. If the extrapolation grade exceeds the retrain_threshold, the simulation is stopped. The most relevant configurations are selected from the collected candidates using the maxvol criterion [1]. Energy, forces, and stress are calculated for these configurations using the reference calculator, and added to the training data set. A new MTP is trained on this extended data set and the simulation is started from the beginning with the new MTP.

This cycle is repeated until the simulation completes without extrapolating significantly, i.e. without exceeding the retrain threshold, or the maximum number of iterations as specified by max_iterations is reached.

All additional configurations, which are added to the original training data set during an active learning simulation, are stored with their calculated reference energy, forces, and stress to a ConfigurationDataContainer object which is saved to candidate_trajectory_filename. The additional training data can also be be accessed via the method additionalTrainingData(). The method additionalTrainingSet() returns the additional training data as TrainingSet object. Furthermore, ActiveLearningSimulation provides a method trainingSetTable(), which returns a Table that contains both the initial training set as well as the additional training set. This table can directly be used as input to a MomentTensorPotentialTraining object to perform a final MTP fitting run and optimize the MTP hyperparameters.

A D3 dispersion calculator can be added in order to account for dispersion during the MD simulations of the active learning cycle. This is achieved by creating the ActiveLearningSimulation object with a D3 dispersion calculator attached via the d3_dispersion_calculator keyword. This calculator is required to be a TremoloXCalculator calculator wrapped around a given D3 dispersion potential. For creating the calculator a D3 dispersion potential, e.g. DispersionD3Z(xc='PBE'), has to used for creating a TremoloXCalculator object - yielding the overall input TremoloXCalculator(DispersionD3Z(xc='PBE')). The type of D3 dispersion potential and the xc functional included in the TremoloXCalculator object should be explicitly specified according to the specific modeling needs.

Query-by-committee¶

Query-by-committee is an alternative to the maxvol algorithm to calculate the extrapolation grade for multi-element systems. In this case, an ensemble of different MTP models is trained on the same training data (default is 6 different models), each with different initial guesses. The ensemble standard deviation of the forces prediction is used as a measure of extrapolation, which can be interpreted as a prediction of the forces error in the configuration [3]. The query-by-committee algorithm typically provides a numerically more stable way to calculate the extrapolation grade and does not require reduced MTP accuracy settings. In particular it can be used with any MTP basis size and it is therefore the recommended algorithm.

The query-by-committee method can be activated by setting extrapolation_grade_algorithm=QueryByCommitteeForces in ExtrapolationSelectionParameters. Unlike maxvol, query-by-committee does not provide a native algorithm for selecting the most different candidate configurations out of all extrapolating candidates. Instead, a dedicated separate selection algorithms is used (MTP-structural-descriptor). The parameters for this method can be selected through the extrapolation_selection_parameters, which takes an ExtrapolationSelectionParameters object.

Multiple runners¶

Using multiple runners in active learning MD or optimization is an extension to the standard way of running active learning to increase the exploration of different new configurations. Here, the simulation is split up and run on multiple different initial configurations simultaneously, collecting candidates from all simulations. When all simulations have stopped due to exceeding the retraining threshold, all the candidates are gathered, selected, and collectively included in the training before the next iteration is started. This allows to include, e.g. various stoichiometric compositions, or interface representations in the same active learning simulation, instead of running multiple simulations sequentially.

Multiple runners can be set up by passing a list of initial configurations as configuration parameter in runMolecularDynamics() or runOptimizeGeometry(). For running this type of simulation efficiently, one should ideally allocate at least as many MPI processes as number of initial configurations.

Practical guide¶

Although it can also be used in production simulations, the ActiveLearningSimulation class is primarily designed to efficiently extend existing training data via MD simulations without having to run computationally expensive ab-initio MD simulations for amorphous or high-temperature systems. Apart from the automatically added training data in the candidate trajectory file, it can also be useful to take snapshots from the final MD trajectory, recalculate energy, forces, and stress with the reference calculators (e.g. by passing it as TrainingSet to MomentTensorPotentialTraining) to obtain additional training configurations that can be added to a larger training data set.

For consistency with the input of MomentTensorPotentialTraining it is recommended to use a list or Table of TrainingSet objects to pass the initial training data.

When using the default maxvol algorithm, it is recommended to reduce the MTP accuracy settings, e.g. MTP basis, outer cutoff radius, and to include only the most relevant training data in the initial training data set to be run in a robust and efficient way. One can occasionally encounter the error message “No new candidates found in active learning”, often accompanied by very large extrapolation grade values. This is typically caused by an MTP basis set being too large for the given training data, which leads to numerical inaccuracies when calculating the extrapolation grade and selecting the candidates to add to the next training iteration. In this case it often helps to reduce the MTP basis size in the MomentTensorPotentialFittingParameters until the problem disappears. Alternatively, one can try to include more diverse configurations to the initial training data, for example by running DFT-MD.

Choosing the query-by-committee algorithm instead, can often avoid this problem from the start and provide a more stable simulation.

When starting an active learning simulation from scratch it is recommended to scan over randomly generated initial non-linear coefficients and select the best fit for further training, as shown in the example section. If an active learning simulation is continued, then it is best to use the non-linear coefficients from the previous run to keep he training consistent. This can be done by passing the MTP filename from the previous run as initial_coefficients parameter in the NonLinearCoefficientsParameters.

In order to run an active learning MD simulation more efficiently one can increase the check_interval parameter, to reduce the frequency at which the extrapolation grade is calculated. As a fallback the extrapolation grade is always checked, whenever the largest force on an atom exceeds the value max_forces_check, which is typically a sign of extrapolation. This is only supported for MD, whereas in optimization simulations the extrapolation grade is always checked at every step.

Note, that the initial training data must have pre-calculated reference data (e.g. DFT energy, forces, stress). That means e.g. TrainingSet objects must be set up with recalculate_training_data=False ActiveLearningSimulation does not support calculating the reference data for the initial training data that is given.

# RJL This sentence seems to be cut off. The training data generated during an active learning run can be accessed via

Active Learning MD¶

A typical workflow for training via active learning MD could be to use displaced bulk crystal or interface configurations generated via RandomDisplacementsParameters or CrystalInterfaceTrainingParameters respectively, as initial training data. Simulating the crystal at high temperature using active learning MD to melt it can then be used to include liquid and amorphous configurations to the training data. Typical values for candidate and retrain threshold are 1.0 and 3.0, respectively. The larger these values are, the more extrapolation is allowed.

MD with active learning can be combined with hook functions. This can be used to include non-equilibrium simulations in the training.

Active Learning geometry optimization¶

This type of optimization can supplement molecular dynamics simulations by bridging e.g. amorphous and crystalline phases. Since optimization trajectories tend to be shorter than MD trajectories, and configurations can change more significantly between optimization steps, it is recommended to set the check_interval parameter to 1 and lower the candidate- and retrain thresholds to add sufficiently many new structures to the training set. Note that restart_strategy and hook functions available in OptimizeGeometry are not supported in ActiveLearningSimulation. This method is used when crystal structure prediction is run with an ActiveLearningSimulation object (see also the CrystalStructurePrediction reference manual).

Active Learning nudged elastic band optimization¶

This type of active learning optimization can be used to train potentials for reaction paths and transition states. Similarly to OptimizeGeometry, hook functions and restart_strategy arguments are not available when running OptimizeNudgedElasticBand through active learning. Every image in the NudgedElasticBand is checked individually for its extrapolation grade, potentially adding multiple candidate structures each optimization iteration. However, it is still recommended to set the check_interval parameter to 1. A preoptimization of the endpoints can be performed. In that case the initial MTP should be able to perform the optimization without unphysical results, as they are not part of the learning process. Note that multiple independent runners are not supported.