CrystalStructurePrediction

class CrystalStructurePrediction(initial_population, number_of_generations, selection_pressure=None, number_of_elites=None, number_to_promote=None, heredity_probability=None, permutation_probability=None, mutation_probability=None, sigma_lattice=None, max_forces=None, max_stress=None, max_steps=None, max_step_length=None, external_pressure=None, rng=None, write_population=None, population_format=None, log_filename_prefix=None, permutation_operator=None, heredity_operator=None, mutation_operator=None, kpoint_density=None, fitness_function=None, fitness_tolerance=None, active_learning=None)

Function to perform a crystal structure search by applying a genetic algorithm on a given initial population.

Parameters:
  • initial_population (list of BulkConfiguration objects | Population) – A list of initial BulkConfigurations or a Population object.

  • number_of_generations (int) – The number of times the population should be evolved.

  • selection_pressure (float) – The pressure applied during the selection of the parents from the population. The higher the pressure the less likely it is to pick an individual with lower fitness.
    Default: 2.0.

  • number_of_elites (int) – Only consider the number_of_elites fittest individuals when making the selection. If None, consider all individuals during selection.
    Default: None

  • number_to_promote (int) – The number of fittest individuals to promote to the next generation.
    Default: 1.

  • heredity_probability (float) – The unnormalized probability of using the heredity operator.
    Default: 0.6.

  • permutation_probability (float) – The unnormalized probability of using the permutation operator. The permutation operator will only be used when there is more than one type of an element in the system.
    Default: 0.0.

  • mutation_probability (float) – The probability of using the mutation operator.
    Default: 0.4.

  • sigma_lattice (float) – The standard deviation of the random strain applied by the mutation operator.
    Default: 0.7.

  • max_forces (PhysicalQuantity in units of force) – The maximum forces determining when the optimization should stop.
    Default: 0.01*eV/Angstrom

  • max_stress (PhysicalQuantity in units of pressure) – The maximum stress determining when the optimization should stop.
    Default: 0.1*GPa

  • max_steps (uint) – The maximum number of steps for local optimization.
    Default: 1000

  • max_step_length (PhysicalQuantity in units of length) – The maximum step length the local optimizer may take.
    Default: 0.2*Angstrom

  • external_pressure (PhysicalQuantity in units of pressure) – The target external pressure of the system.
    Default: 0*GPa

  • rng (numpy.random.RandomState) – The random number generated used to generate the random number stream. If None, rng will be new random number generator, seeded from OS entropy pool
    Default: None.

  • write_population (bool) – Enables saving the configuration of each individual configuration during the crystal structure prediction.
    Default: True.

  • population_format (str) – Sets the storage format for populations. If None, the format is hdf5.
    Default: None.

  • log_filename_prefix (str) – The filename prefix to use for logging the individual optimizations.
    Default: generation_.

  • permutation_operator (PermutationOperator.) – The permutation operator.

  • heredity_operator (HeredityOperator.) – The heredity operator.

  • mutation_operator (MutationOperator.) – The mutation operator.

  • kpoint_density (PhysicalQuantity of type length | None) – The target k-point sampling density. If None, a fixed k-point grid is used.
    Default: None.

  • fitness_function (callable) – By default, the fitness is defined as the negative enthalpy of the system. This function allows for the fitness to be customized. The fitness_function should accept two arguments, “configuration” and “enthalpy” , and return the fitness (larger values are more fit). The fitness should be given as a floating point number (without units). The configuration argument is the geometry optimized individual to evaluate the fitness for.
    Default: None.

  • fitness_tolerance (float) – Configurations whose fitness values differ by more than this amount are considered difference structures. This is used as part of the niching step, which removes duplicate configurations from the population.
    Default: 0.02.

  • active_learning (ActiveLearningSimulation) – Can be used to specify an ActiveLearningSimulaiton object, which is then used to run the geometry optimization while training an MTP. The calculator on the configuration is ignored in this case. Note, this does not support restarting from file.
    Default: None.

activeLearning()
Returns:

The active learning object, if specified, otherwise None.

Return type:

ActiveLearningSimulation | None

evolvePopulation(number_of_generations)

Evolves the current population for number_of_generations generations.

Parameters:

number_of_generations (int) – The number of times the genetic algorithm is applied to the population.

fitnesses()
Returns:

Return the fitnesses of the current population sorted by descending fitness.

Return type:

list of type float

kpointDensity()
Returns:

The k-point density.

Return type:

PhysicalQuantity of type length | None

population()
Returns:

Return all configurations in the population sorted by descending fitness.

Return type:

Population

symmetries()

Return the international symbol for each individual of the current population sorted by descending fitness.

Returns:

a list of international symbols

Return type:

list of type str

uniqueString()

Return a unique string representing the state of the object.

Usage Examples

Determine the global minimum energy crystal structure for TiO2.

# -------------------------------------------------------------
# Initial Population
# -------------------------------------------------------------
population_size=10

initial_population = generateInitialPopulation(
    elements=elements,
    population_size=population_size,
    calculator=calculator,
    volume=206.29*Angstrom**3,
    max_forces=0.01*eV/Angstrom,
    max_stress=0.1*GPa,
    max_steps=1000,
    max_step_length=0.2*Angstrom,
    external_pressure=0.0*GPa,
    log_filename_prefix="initial_population_",
)

# -------------------------------------------------------------
# Crystal Structure Prediction
# -------------------------------------------------------------

global_optimization = CrystalStructurePrediction(
    initial_population,
    number_of_generations=16,
    heredity_probability=0.85,
    mutation_probability=0.1,
    permutation_probability=0.05,
    number_to_promote=2,
    number_of_elites=int(0.6*population_size),
    selection_pressure=3.0,
    max_forces=0.005 * eV/Angstrom,
)

csp_TiO2.py

Find the crystal structure that most closely matches a target band gap, that is also stable (low enthalpy).

# Define custom fitness function.
def fitnessFunction(configuration, enthalpy):
    # Calculate the band gap.
    band_gap = Bandstructure(
        configuration=configuration,
        route=['G','G'],
        points_per_segment=3,
    ).indirectBandGap()

    # Target band gap.
    target_band_gap = 1.34 * eV

    # High fitness is low enthalpy and a band gap near the target. A parameter
    # alpha (ranged from zero to one) controls the relative importance of
    # enthalpy vs band gap error.
    alpha = 0.5
    fitness = -alpha*enthalpy - (1 - alpha) * abs(band_gap - target_band_gap)

    # The fitness should not have units.
    return fitness.inUnitsOf(eV)


global_optimization = CrystalStructurePrediction(
    initial_population,
    number_of_generations=16,
    heredity_probability=0.85,
    mutation_probability=0.1,
    permutation_probability=0.05,
    number_to_promote=2,
    number_of_elites=int(0.6*len(initial_population)),
    selection_pressure=3.0,
    max_forces=0.01 * eV/Angstrom,
    fitness_function=fitnessFunction,
)

Run a crystal structure prediction in active learning.

# Read the initial training dataset for the active learning simulation.
dataset = nlread('Initial_training_data.hdf5', MomentTensorPotentialTraining)[-1]
initial_training_data = [TrainingSet(dataset, recalculate_training_data=False)]

# Define the MTP fitting parameters.
nl_parameters = NonLinearCoefficientsParameters(
    perform_optimization=False,
    initial_coefficients=Random,
)

fitting_parameters = MomentTensorPotentialFittingParameters(
    basis_size=800,
    outer_cutoff_radii=4.5*Ang,
    mtp_filename='MTP_CSP_AL.mtp',
    non_linear_coefficients_parameters=nl_parameters,
    forces_cap=500.0*eV/Ang,
    use_element_specific_coefficients=True,
)

# Set up the active learning simulation.
al = ActiveLearningSimulation(
    fitting_parameters=fitting_parameters,
    initial_training_data=initial_training_data,
    mtp_study_filename='MTP_training_CSP_AL.hdf5',
    mtp_study_object_id='training',
    reference_calculator=calculator,
    candidate_threshold=1.0,
    retrain_threshold=3.0,
    check_interval=1,
    max_forces_check=10.0*eV/Ang,
    minimum_bond_length_percent=0.2,
    candidate_trajectory_filename='MTP_training_CSP_AL_candidates.hdf5',
    restart_simulation=False,
    extrapolation_selection_parameters=ExtrapolationSelectionParameters(
        extrapolation_grade_algorithm=QueryByCommittee,
        committee_size=6,
        descriptor_cutoff=0.2,
        descriptor_basis_size=100,
    ),
)

# Read the initial population from file.
initial_population = nlread('Initial_population.hdf5', Population)[0]

# Run the crystal structure predicition with active learning.
global_optimization = CrystalStructurePrediction(
    initial_population,
    number_of_generations=10,
    selection_pressure=2.0,
    number_of_elites=10,
    number_to_promote=4,
    heredity_probability=50.0,
    permutation_probability=15.0,
    mutation_probability=35.0,
    sigma_lattice=0.7,
    max_forces=0.01*eV/Angstrom,
    max_stress=0.1*GPa,
    max_steps=1000,
    max_step_length=0.2*Angstrom,
    external_pressure=0*GPa,
    active_learning=al,
    log_filename_prefix='csp_al',
)

csp_active_learning_example.py

Notes

The goal of crystal structure prediction is to explore the state space of a compound and locate the structure with the maximum fitness for a given chemical composition. In QuantumATK, the default fitness function is set to the negative enthalpy, i.e. the algorithm converges to the structure with the minimal enthalpy (or energy if no external pressure is applied). The primary input to the calculation is the chemical formula, which determines the number of atoms in the unit cell. It is important that enough atoms are present in the unit cell to represent the crystal structure. Just because the correct stoichiometric ratio is used, does not ensure that it is possible to represent the minimum energy structure.

Custom Fitness Function

The default fitness function is the negative enthalpy. A custom fitness function may be provided using the fitness_function argument. The function takes two arguments: the minimized configuration and the enthalpy. It should return the fitness of the configuration (larger fitness means more favorable structure).

Custom fitness functions can be used to find structures that yield desired material properties or match known experimental values. Care must be taken to ensure that the resulting structures are physically plausible. This means that the enthalpy should be included in the fitness. Practically, this means that the fitness should be a weighted sum of the desired physical properties and the enthalpy. Otherwise, the evolutionary algorithm will find very high energy structures that are not stable or possible to synthesize.

Algorithm Details

The crystal structure prediction algorithm in QuantumATK uses a genetic algorithm. The genetic algorithm begins with an initial population of structures provided by the user. The initial population can be generated using the generateInitialPopulation function. Each step of the algorithm involves generating the next population of structures from the previous one. This evolution process is done in the following manner:

  1. Promote the number_to_promote fittest individuals to the next generation.

  2. Loop until there are population_size individuals in the new generation:

  1. Pick which genetic operation should be performed. Each operation is given a user defined probability of being selected.

  2. Select one (mutation and permutation operator) or two parent structures (heredity operator). The selection probability is weighted by the fitness of the parental structures.

  3. Execute the genetic operation and create a new structure.

  1. Perform a local optimization of each individual of the population.

There are four types of genetic operators available:

  • Promotion: this operation is always performed at the beginning of the evolution process to ensure that the best number_to_promote individuals are kept.

  • Mutation: perturb the lattice vectors and atomic positions of the chosen configuration randomly.

  • Permutation: swap the position of atoms of differing elements.

  • Heredity: combine two parental structures to a new one. A random cut plane is used to divide the atoms of the two parent structures; all atoms below the plane in one structure are combined with the atoms above the plane in the other structure to create a new “child” structure. Care is taken to ensure that the total number of each element is not changed.

The algorithm stops after evolving the population number_of_generations times. Given the stochastic nature of the algorithm, this stopping criterion does not ensure that one has found the global solution. The stability of the search should be checked at least with respect to the chosen seed of the pseudo random number generator (rng), the initial_population and number_of_generations.

After generating a new population, each individual of the population is away from a local minimum in enthalpy, hence a geometry optimization must be performed to converge to a close local minimum. Since the global minimum is also a local minimum, one is guaranteed to find it for sufficiently large numbers of number_of_generations. These geometry (local) optimizations comprise nearly all of the computational work during crystal structure prediction.

Note

One key difference between a genetic algorithm and a random search is that the selection of the parent structure(s), on which the genetic operations are performed, is biased by the fitness of that structure. The higher the fitness (i.e. the lower the calculated enthalpy in this implementation), the more likely it is to pick this structure. The skewness of the probability distribution is altered by selection_pressure. A large value of selection_pressure makes it less likely to select a configuration with a lower fitness.

Restart support

A crystal structure prediction run can be restarted / resumed from a previous point in the following manner:

CSP = CrystalStructurePrediction(
    ...
    )

# Save the current state of the CSP object.
nlsave('my_file.hdf5', CSP, object_id='my_run')

...

# Read back a stored CSP object. The last population is restored
# and all given properties / attributes are set up.
CSP = nlread('my_file.hdf5', object_id='my_run')

# Run 10 more iterations of the genetic algorithm.
CSP.evolvePopulation(10)

# Run 5 more iterations of the genetic algorithm.
CSP.evolvePopulation(5)

Note

At this time it is not possible to resume runs that use a custom fitness function.

Analyzing Results

If write_population is True then after each step of the genetic algorithm the current population will be written to disk. The filename will start with log_filename_prefix and include the current generation number. For example, the initial population will be saved as generation_0.hdf5 and after the first step a new file generation_1.hdf5 will be saved. The crystal structures contained in these files can be viewed in QuantumATK using the Viewer.

Crystal Structure Prediction with Active Learning

Active learning simulation can be activated in crystal structure prediction by passing an ActiveLearningSimulation object via the parameter active_learning. This will use runOptimizeGeometry in the active learning simulation to optimize the individuals. This type of optimization can smooth the MTP based potential energy surface and can easily drag the system towards the local minimum.

Note that the calculator on the initial population is ignored in this case, and the trained MTP is used instead. Since the MTP is evolving during the active learning simulation, it is recommended to re-run the crystal structure prediction simulation multiple times, starting from the previous active learning state (using restart_simulation=True in ActiveLearningSimulation), until no more training configurations are added. Using multiple process groups for crystal structure prediction is currently not supported when run with active learning.

Since optimization trajectories tend to be shorter and configurations can change significantly between optimization steps, it is recommended to set the check_interval parameter to 1 and lower the candidate- and retrain thresholds to add sufficiently many new structures to the training set.