MolecularDynamicsSnapshotsParameters

class MolecularDynamicsSnapshotsParameters(reference_configurations, supercell_repetitions_list=None, strain_tensors=None, calculator=None, sample_size=None, steps=None, method=None, log_filename_prefix=None, data_tag=None)

Class for storing parameters for generating a set of training configurations using snapshots from a molecular dynamics simulation.

Parameters:
  • reference_configurations (MoleculeConfiguration | BulkConfiguration | sequence of [MoleculeConfiguration | BulkConfiguration]) – One or more reference configurations to be used to generate the training set. For bulk configurations, this is the unit cell from which a supercell can be defined and additional strain can be applied.

  • supercell_repetitions_list (sequence (size 3) of int | sequence of sequence (size 3) of int) – The list of supercells to construct for each configuration, given as the number of repetitions of the bulk unit cell along the (a, b, c) directions. Cannot be specified if there are molecules in the list of reference configurations.
    Default: No repetitions.

  • strain_tensors (sequence (size 3) of sequence (size 3) of float | sequence of sequence (size 3) of sequence (size 3) of float) – The list of strains to apply to each configuration. Cannot be specified if there are molecules in the list of reference configurations.
    Default: No strain.

  • calculator (Calculator) – The calculator to use for the molecular dynamics simulations.
    Default: The same calculator used to fit the moment tensor potential.

  • sample_size (int) – The number of training configurations to generate for each combination of reference configuration, supercell, and strain.
    Default: 20

  • steps (int) – The total number of molecular dynamics steps to take for each combination of reference configuration, supercell, and strain.
    Default: 10 * sample_size

  • method (BaseMDmethod) – The molecular dynamics method to use.
    Default: Langevin(initial_velocity=MaxwellBoltzmannDistribution())

  • log_filename_prefix (str) – Filename prefix for the logging output of the tasks associated with this set.
    Default: Defined by the MomentTensorPotentialTraining object.

  • data_tag (str) – Label for this training set to enable selection of different data in MTP fitting.

calculator()
Returns:

The calculator to use for the molecular dynamics simulation, or None for the same calculator used to fit the moment tensor potential.

Return type:

Calculator | None

dataTag()
Returns:

The selection tag added to the data in the training set.

Return type:

str

logFilenameIdentifier()
Returns:

Filename identifier for the logging output of the tasks associated with this set, or None if it hasn’t been set yet.

Return type:

str | None

logFilenamePrefix()
Returns:

Filename prefix for the logging output of the tasks associated with this set, or None if it is to be defined by the MomentTensorPotentialTraining object.

Return type:

str | LogToStdOut | None

method()
Returns:

The molecular dynamics method to use.

Return type:

BaseMDmethod

referenceConfigurations()
Returns:

The list of reference configurations to be used to generate the training set.

Return type:

list of [MoleculeConfiguration | BulkConfiguration]

sampleSize()
Returns:

The number of training configurations for each combination of list parameters.

Return type:

int

steps()
Returns:

The total number of molecular dynamics steps to take.

Return type:

int

strainTensors()
Returns:

The list of strains to apply to each configuration (None for no strain).

Return type:

list of [numpy.ndarray | None]

supercellRepetitionsList()
Returns:

The list of supercells to construct for each configuration, given as the number of repetitions of the bulk unit cell along the (a, b, c) directions.

Return type:

list of tuple (size 3) of int

systemSizes()
Parameters:

system_sizes (int | sequence of int) – Minimum number of atoms in the system. If None then supercell_repetitions_list will be used to do the repetition of the cell. If set, supercell_repetitions_list will be ignored.
Default: None

uniqueString()

Return a unique string representing the state of the object.

Usage Examples

Run a MD simulation of a quartz crystal via MolecularDynamicsSnapshotsParameters using a fast Tersoff force field for 50000 steps, and recalculate energy, forces, and stress for 50 equally spaced snapshots with the reference DFT calculator.

# Set up lattice
lattice = Hexagonal(4.966693338812276*Angstrom, 5.4647751809821905*Angstrom)

# Define elements
elements = [Silicon, Silicon, Silicon, Oxygen, Oxygen, Oxygen, Oxygen, Oxygen,
            Oxygen]

# Define coordinates
fractional_coordinates = [[ 0.469217958328,  0.            ,  0.            ],
                          [ 0.            ,  0.469217958328,  0.666666666667],
                          [ 0.530782041672,  0.530782041672,  0.333333333333],
                          [ 0.409091887977,  0.269718793416,  0.115058361235],
                          [ 0.269718793416,  0.409091887977,  0.551608638765],
                          [ 0.730281206584,  0.139373094561,  0.781725361235],
                          [ 0.590908112023,  0.860626905439,  0.218274638765],
                          [ 0.860626905439,  0.590908112023,  0.448391361235],
                          [ 0.139373094561,  0.730281206584,  0.884941638765]]

# Set up reference configuration of quartz
reference_configuration = BulkConfiguration(
    bravais_lattice=lattice,
    elements=elements,
    fractional_coordinates=fractional_coordinates,
)

# Define the molecular dynamics method
initial_velocity = MaxwellBoltzmannDistribution(
    temperature=1500.0*Kelvin,
    remove_center_of_mass_momentum=True,
    random_seed=None,
    enforce_temperature=True,
)

method = Langevin(
    time_step=1*femtoSecond,
    reservoir_temperature=1500*Kelvin,
    friction=0.01*femtoSecond**-1,
    initial_velocity=initial_velocity,
    heating_rate=0*Kelvin/picoSecond,
)

# Define the molecular dynamics calculator
potentialSet = Tersoff_SiO_2007()
calculator = TremoloXCalculator(parameters=potentialSet)

# Define the MD snapshots parameters:
# Run 50000 MD steps and take 50 equally spaced samples for the final training set.
training_sets = MolecularDynamicsSnapshotsParameters(
    reference_configuration,
    calculator=calculator,
    sample_size=50,
    steps=50000,
    method=method,
    log_filename_prefix=None,
)

# Set up MTP training and run the training data generation and calculation.
moment_tensor_potential_training = MomentTensorPotentialTraining(
    filename='mtp_study.hdf5',
    object_id='training',
    training_sets=training_sets,
    calculator=LCAOCalculator(),
    calculate_stress=True,
)
moment_tensor_potential_training.update()

MolecularDynamicsSnapshotsParameters_example.py

Notes

This class provides a convenient and automatic way to generate and use snapshots from MD simulations as training data. There are two ways this class can be used:

  • Use the final reference calculator to run the MD. In this case the configurations are just taken from the MD and no re-calculation of energy, forces, and stress is needed.

  • Use a fast and more approximate calculator (e.g. force field, semi-empirical, or fast DFT) for the MD simulation, and recalculate energy, forces, and stress only for the selected snapshots with the more accurate reference DFT calculator.

Since the snapshots from MD simulations are typically very correlated and similar to the previous images, using all available images from an MD simulation usually does not provide much additional information to the training of a ML force field. Therefore, one often has to run long MD to obtain a range of uncorrelated and different configurations.

Fortunately highly accurate DFT method are not normally required to obtain physically meaningful configurations for use as training data. A empirical potential with more approximate descriptions of the interactions is often sufficient for generating training configurations.

This procedure of using a more approximate FF or DFT method to quickly generate different relevant training configurations, and then using the final reference calculator for the training data of selected uncorrelated snapshots can be very efficient. It can especially work well when fitting MTPs for amorphous materials or interfaces, where sampling the range of possible configurations is a little more complicated.

If needed, additional relevant training data can be generated in a second step using the ActiveLearningSimulation method to refine the total training data set.

It is also possible to use different supercell sizes or apply different strain tensors to the reference configuration. In this case a separate MD simulation will be run for each combination the given values, each producing sample_size final training configurations.

This class always generates an MD trajectory. If one wants to use the trajectory from an existing MD simulation as training data, one should use the TrainingSet class instead.