MolecularConfigurationsParameters

class MolecularConfigurationsParameters(reference_configurations, sample_size=None, monte_carlo_calculator=None, monte_carlo_temperature=None, max_rotated_bonds=None, overlap_ratio=None, max_displacement=None, sample_cis_trans=None, generate_bulk_configurations=None, random_seed=None, log_filename_prefix=None, data_tag=None)

Class for storing parameters for generating a set of training configurations using a combination of sampling different torsion angles and atomic displacements.

Parameters:
  • reference_configurations (MoleculeConfiguration | BulkConfiguration | DeviceConfiguration | SurfaceConfiguration | sequence of configurations) – One or more reference configurations to be used to generate the training set.

  • sample_size (int) – The total number of configurations to return.
    Default: 20

  • monte_carlo_calculator (Calculator | None) – The calculator to use for the molecular dynamics simulations. Giving None disables Monte Carlo structure selection.
    Default: None.

  • monte_carlo_temperature (PhysicalQuantity of type temperature) – Temperature used in the Monte Carlo selection.
    Default: 300K.

  • max_rotated_bonds (int) – The maximum number of bonds rotated in each new configuration.
    Default: Rotate all available bonds in each configuration.

  • overlap_ratio (float) – The acceptable VDW radius overlap between two atoms.
    Default: 0.7.

  • max_displacement (PhysicalQuantity of type length) – The maximum displacement in the atomic random displacement.
    Default: 0.1 Angstrom.

  • sample_cis_trans (bool) – Whether or not cis and trans orientations of SP2 bonds are sampled.
    Default: True.

  • generate_bulk_configurations (bool) – Whether or not the returned dataset contains bulk configurations.
    Default: False.

  • random_seed (int) – The random seed used for generating the displacements.
    Default: Generated automatically.

  • log_filename_prefix (str) – Filename prefix for the logging output of the tasks associated with this set.
    Default: Defined by the MomentTensorPotentialTraining object.

  • data_tag (str) – Label for this training set to enable selection of different data in MTP fitting.

dataTag()
Returns:

The selection tag added to the data in the training set.

Return type:

str

generateBulkConfigurations()
Returns:

Whether or not the returned configurations are Bulk or Molecules.

Return type:

bool

logFilenameIdentifier()
Returns:

Filename identifier for the logging output of the tasks associated with this set, or None if it hasn’t been set yet.

Return type:

str | None

logFilenamePrefix()
Returns:

Filename prefix for the logging output of the tasks associated with this set, or None if it is to be defined by the MomentTensorPotentialTraining object.

Return type:

str | LogToStdOut | None

maxDisplacement()
Returns:

The maximum random displacement for each atom.

Return type:

PhysicalQuantity

maxRotatedBonds()
Returns:

The maximum number of bonds rotated in each trial configuration.

Return type:

int

monteCarloCalculator()
Returns:

The calculator used to calculate Monte Carlo energies.

Return type:

Calculator | None

monteCarloTemperature()
Returns:

The temperature used in the Monte Carlo selection

Return type:

PhysicalQuantity of type temperature

overlapRatio()
Returns:

The minimum acceptable VDW radii overlap ratio.

Return type:

float

randomSeed()
Returns:

The random seed used in the random number generator.

Return type:

int | None

referenceConfigurations()
Returns:

The list of reference configurations to be used to generate the training set.

Return type:

list of [MoleculeConfiguration | BulkConfiguration]

sampleCisTrans()
Returns:

Whether or not cis and trans orientations of SP2 bonds are sampled.

Return type:

bool

sampleSize()
Returns:

The number of training configurations for each combination of list parameters.

Return type:

int

strainTensors()
Returns:

The list of strains to apply to each configuration (None for no strain).

Return type:

list of [numpy.ndarray | None]

supercellRepetitionsList()
Returns:

The list of supercells to construct for each configuration, given as the number of repetitions of the bulk unit cell along the (a, b, c) directions.

Return type:

list of tuple (size 3) of int

systemSizes()
Parameters:

system_sizes (int | sequence of int) – Minimum number of atoms in the system. If None then supercell_repetitions_list will be used to do the repetition of the cell. If set, supercell_repetitions_list will be ignored.
Default: None

uniqueString()

Return a unique string representing the state of the object.

Usage Examples

Setup of a training set of C2F 4 and C2F 6 using MolecularConfigurationsParameters.

setVerbosity(MinimalLog)

# -------------------------------------------------------------
# Fluoroethane configuration
# -------------------------------------------------------------
# Define elements
elements = [Carbon, Carbon, Fluorine, Fluorine, Fluorine, Fluorine, Fluorine, Fluorine]

# Define coordinates
cartesian_coordinates = [[-0.003263937084, -0.            ,  0.            ],
                         [ 1.516263937084, -0.            , -0.            ],
                         [-0.469715381813,  0.            , -1.301600235398],
                         [-0.469715381813, -1.127218869426,  0.650800117699],
                         [-0.469715381813,  1.127218869426,  0.650800117699],
                         [ 1.982715381813,  1.127218869426, -0.650800117699],
                         [ 1.982715381813,  0.            ,  1.301600235398],
                         [ 1.982715381813, -1.127218869426, -0.650800117699]]*Angstrom

# Set up configuration
C2F6_configuration = MoleculeConfiguration(
    elements=elements,
    cartesian_coordinates=cartesian_coordinates
)
C2F6_configuration.findBonds()

# -------------------------------------------------------------
# Fluoroethene configuration
# -------------------------------------------------------------
# Define elements
elements = [Carbon, Carbon, Fluorine, Fluorine, Fluorine, Fluorine]

# Define coordinates
cartesian_coordinates = [[ 0.089971642147,  0.            , -0.            ],
                         [ 1.423028357853,  0.            ,  0.            ],
                         [-0.594324695679,  0.            , -1.173915650421],
                         [-0.594324695679,  0.            ,  1.173915650421],
                         [ 2.107324695679,  0.            ,  1.173915650421],
                         [ 2.107324695679,  0.            , -1.173915650421]]*Angstrom

# Set up configuration
C2F4_configuration = MoleculeConfiguration(
    elements=elements,
    cartesian_coordinates=cartesian_coordinates
)
C2F4_configuration.findBonds()

# -------------------------------------------------------------
# Specify molecular training set
# -------------------------------------------------------------
training_set = MolecularConfigurationsParameters(
    [C2F6_configuration, C2F4_configuration],
    sample_size=100,
    generate_bulk_configurations=True
)

# -------------------------------------------------------------
# Specify fitting parameters
# -------------------------------------------------------------
nl_parameters = NonLinearCoefficientsParameters(
    perform_optimization=True,
    energy_only=False,
)

fitting_parameters = MomentTensorPotentialFittingParameters(
    inner_cutoff_radii=1.0*Angstrom,
    tapering_cutoff_radii=1.1*Angstrom,
    outer_cutoff_radii=5*Angstrom,
    mtp_filename='Carbon_Fluorine_MTP.mtp',
    non_linear_coefficients_parameters=nl_parameters,
)

# -------------------------------------------------------------
# Generate the MTP
# -------------------------------------------------------------
mtp = MomentTensorPotentialTraining(
    'Carbon_Fluorine_MTP.hdf5',
    'Carbon_Fluorine_MTP',
    [training_set],
    calculator=LCAOCalculator(),
    fitting_parameters_list=[fitting_parameters],
    train_test_split=0.9,
)
mtp.update()

Molecule_MTP.py

This script generates 100 configurations each of C2F 4 and C2F 6. The energy, force and stress of each configuration is then calculated using density functional theory. This data is then used to fit an MTP potential with settings specified with MomentTensorPotentialFittingParameters. The MomentTensorPotentialTraining study object containing the data and fitting statistics is in the file Carbon_Fluorine_MTP.hdf5. The fitted MTP potential is in the file Carbon_Fluorine_MTP.mtp. A calculator using this MTP potential can be returned from the MomentTensorPotentialTraining object using the method momentTensorPotentialCalculators.

Notes

The MolecularConfigurationsParameters generates a trajectory of configurations that can be used to train a moment tensor potential. Starting with one or more configurations as input, new configurations are generated by rotating available bond torsions and adding random displacements to the atoms. Bonds that are part of cyclic structures, such as the carbon-carbon bond in cyclohexane are not freely rotated so as to not break the ring structure. Double and triple bonds are also not freely rotated because of their typically high rotational barriers. Double bonds are rotated over a smaller range centered on the planar geometry.

To avoid larger molecules being twisted onto itself while sampling the torsions, distances between non-bonded atoms are checked for each configuration. Configurations where inter-atomic distances are shorter than the factor of the van der Waals radius given by the parameter overlap_ratio are rejected. An additional Monte Carlo selection can also be performed on configurations with rotated torsions. A calculator for evaluating the configuration energy and a Monte Carlo selection temperature must be specified. This is done using the arguments monte_carlo_calculator and monte_carlo_temperature respectively. If a calculator is not given then then Monte Carlo selection is not performed.

The number of bonds rotated in each trial structure can be set using the argument max_rotated_bonds. Limiting the number of bonds rotated at each step can make it easier to find non-overlapping structures when fitting an MTP for a polymer or other large molecule. For configurations with double bonds, both cis and trans orientations of the bond are sampled by default. Setting the argument sample_cis_trans disables this sampling and turns off bond rotation for double bonds.

In some cases, such as fitting an MTP for a surface process simulations molecular and bulk training sets may be combined together. In these cases it is more convenient for the molecular training sets to also contain bulk configurations. Setting the argument generate_bulk_configurations to True generates a training set of molecules that are placed in periodic boundary conditions with a large buffer.

After defining the MolecularConfigurationsParameters it can be used with MomentTensorPotentialTraining for fitting an MTP potential.