# CrystalStructurePrediction¶

class CrystalStructurePrediction(initial_population, number_of_generations, selection_pressure=2.0, number_of_elites=None, number_to_promote=1, heredity_probability=0.6, permutation_probability=0.0, mutation_probability=0.4, sigma_lattice=0.7, max_forces=PhysicalQuantity(0.01, eV/Ang), max_stress=PhysicalQuantity(0.1, GPa), max_steps=1000, max_step_length=PhysicalQuantity(0.2, Ang), external_pressure=None, rng=None, write_population=True, population_format=None, log_filename_prefix='generation_', permutation_operator=<class 'NL.Dynamics.CrystalStructurePrediction.GeneticOperators.PermutationOperator'>, heredity_operator=<class 'NL.Dynamics.CrystalStructurePrediction.GeneticOperators.HeredityOperator'>, mutation_operator=<class 'NL.Dynamics.CrystalStructurePrediction.GeneticOperators.MutationOperator'>, kpoint_density=None, fitness_function=None, fitness_tolerance=None)

Function to perform a crystal sturcture search by applying a genetic algorithm on a given initial population.

Parameters: initial_population (list of individuals) – A list of initial BulkConfigurations number_of_generations (int) – The number of times the population should be evolved. selection_pressure (float) – The pressure applied during the selection of the parents from the population. The higher the pressure the less likely it is to pick an individual with lower fitness. Default: 2.0 number_of_elites (int) – Only consider the number_of_elites fittest individuals when making the selection. Default: Consider all individuals during selection. number_to_promote (int) – The number of most fittest individuals to promote to the next generation. Default: 1 heredity_probability (float) – The unnormalized probability of using the heredity operator. Default: 0.6 permutation_probability (float) – The unnormalized probability of using the permutation operator. The permutation operator will only be used when there is more than one type of an element in the system. Default: 0.0 mutation_probability (float) – The probability of using the mutation operator. Default: 0.4 sigma_lattice (float) – The standard deviation of the random strain applied by the mutation operator. Default: 0.7 max_forces (PhysicalQuantity in units of force) – The maximum forces determining when the optimization should stop. Default: 0.01*eV/Ang max_stress (PhysicalQuantity in units of pressure) – The maximum stress determining when the optimization should stop. Default: 0.1*GPa max_steps (uint) – The maximum number of steps for local optimization. Default: 1000 max_step_length (PhysicalQuantity in units of length) – The maximum step length the local optimizer may take. Default: 0.2*Ang external_pressure (PhysicalQuantity in units of pressure) – The target external pressure of the system. Default: 0*GPa rng (numpy.random.RandomState) – The random number generated used to generate the random number stream. Default: A new random number generator (seeded from OS entropy pool) write_population (bool) – Enables saving the configuration of each individual configuration during the crystal structure prediction. Default: True population_format (str) – Sets the storage format for populations. Default: hdf5 log_filename_prefix (str) – The filename prefix to use for logging the individual optimizations. Default: generation_ kpoint_density (PhysicalQuantity of type length | None) – The target k-point sampling density. If None, a fixed k-point grid is used. Default: Fixed k-point grid fitness_function (callable) – By default, the fitness is defined as the negative enthalpy of the system. This function allows for the fitness to be customized. The fitness_function should accept two arguments, “configuration” and “enthalpy” , and return the fitness (larger values are more fit). The fitness should be given as a floating point number (without units). The configuration argument is the geometry optimized individual to evaluate the fitness for. fitness_tolerance (float) – Configurations whose fitness values differ by more than this amount are considered difference structures. This is used as part of the niching step, which removes duplicate configurations from the population. Default: 0.02
evolvePopulation(number_of_generations)

Evolves the current population for number_of_generations generations.

Parameters: number_of_generations (int) – The number of times the genetic algorithm is applied to the population.
fitnesses()
Returns: Return the fitnesses of the current population sorted by descending fitness. list of type float
kpointDensity()
Returns: The k-point density. PhysicalQuantity of type length | None
population()
Returns: Return all configurations in the population sorted by descending fitness. list of BulkConfigurations
symmetries()

Return the international symbol for each individual of the current population sorted by descending fitness.

Returns: a list of international symbols list of type str

## Usage Examples¶

Determine the global minimum energy crystal structure for TiO2.

# -------------------------------------------------------------
# Initial Population
# -------------------------------------------------------------

initial_population = generateInitialPopulation(
elements=elements,
population_size=10,
calculator=calculator,
volume=206.29*Angstrom**3,
max_forces=0.01*eV/Angstrom,
max_stress=0.1*GPa,
max_steps=1000,
max_step_length=0.2*Angstrom,
external_pressure=0.0*GPa,
log_filename_prefix="initial_population_",
)

# -------------------------------------------------------------
# Crystal Structure Prediction
# -------------------------------------------------------------

global_optimization = CrystalStructurePrediction(
initial_population,
number_of_generations=16,
heredity_probability=0.85,
mutation_probability=0.1,
permutation_probability=0.05,
number_to_promote=2,
number_of_elites=int(0.6*population_size),
selection_pressure=3.0,
max_forces=0.005 * eV/Angstrom,
)


csp_TiO2.py

Find the crystal structure that most closely matches a target band gap, that is also stable (low enthalpy).

# Define custom fitness function.
def fitnessFunction(configuration, enthalpy):
# Calculate the band gap.
band_gap = Bandstructure(
configuration=configuration,
route=['G','G'],
points_per_segment=3,
).indirectBandGap()

# Target band gap.
target_band_gap = 1.34 * eV

# High fitness is low enthalpy and a band gap near the target. A parameter
# alpha (ranged from zero to one) controls the relative importance of
# enthalpy vs band gap error.
alpha = 0.5
fitness = -alpha*enthalpy - (1 - alpha) * abs(band_gap - target_band_gap)

# The fitness should not have units.
return fitness.inUnitsOf(eV)

global_optimization = CrystalStructurePrediction(
initial_population,
number_of_generations=16,
heredity_probability=0.85,
mutation_probability=0.1,
permutation_probability=0.05,
number_to_promote=2,
number_of_elites=int(0.6*len(initial_population)),
selection_pressure=3.0,
max_forces=0.01 * eV/Angstrom,
fitness_function=fitnessFunction,
)


## Notes¶

The goal of crystal structure prediction is to explore the state space of a compound and locate the structure with the maximum fitness for a given chemical composition. In QuantumATK, the default fitness function is set to the negative enthalpy, i.e. the algorithm converges to the structure with the minimal enthalpy (or energy if no external pressure is applied). The primary input to the calculation is the chemical formula, which determines the number of atoms in the unit cell. It is important that enough atoms are present in the unit cell to represent the crystal structure. Just because the correct stoichiometric ratio is used, does not ensure that it is possible to represent the minimum energy structure.

## Custom Fitness Function¶

The default fitness function is the negative enthalpy. A custom fitness function may be provided using the fitness_function argument. The function takes two arguments: the minimized configuration and the enthalpy. It should return the fitness of the configuration (larger fitness means more favorable structure).

Custom fitness functions can be used to find structures that yield desired material properties or match known experimental values. Care must be taken to ensure that the resulting structures are physically plausible. This means that the enthalpy should be included in the fitness. Practically, this means that the fitness should be a weighted sum of the desired physical properties and the enthalpy. Otherwise, the evolutionary algorithm will find very high energy structures that are not stable or possible to synthesize.

## Algorithm Details¶

The crystal structure prediction algorithm in QuantumATK uses a genetic algorithm. The genetic algorithm begins with an initial population of structures provided by the user. The initial population can be generated using the generateInitialPopulation function. Each step of the algorithm involves generating the next population of structures from the previous one. This evolution process is done in the following manner:

1. Promote the number_to_promote fittest individuals to the next generation.
2. Loop until there are population_size individuals in the new generation:
1. Pick which genetic operation should be performed. Each operation is given a user defined probability of being selected.
2. Select one (mutation and permutation operator) or two parent structures (heredity operator). The selection probability is weighted by the fitness of the parental structures.
3. Execute the genetic operation and create a new structure.
1. Perform a local optimization of each individual of the population.

There are four types of genetic operators available:

• Promotion: this operation is always performed at the beginning of the evolution process to ensure that the best number_to_promote individuals are kept.
• Mutation: perturb the lattice vectors and atomic positions of the chosen configuration randomly.
• Permutation: swap the position of atoms of differing elements.
• Heredity: combine two parental structures to a new one. A random cut plane is used to divide the atoms of the two parent structures; all atoms below the plane in one structure are combined with the atoms above the plane in the other structure to create a new “child” structure. Care is taken to ensure that the total number of each element is not changed.

The algorithm stops after evolving the population number_of_generations times. Given the stochastic nature of the algorithm, this stopping criterion does not ensure that one has found the global solution. The stability of the search should be checked at least with respect to the chosen seed of the pseudo random number generator (rng), the initial_population and number_of_generations.

After generating a new population, each individual of the population is away from a local minimum in enthalpy, hence a geometry optimization must be performed to converge to a close local minimum. Since the global minimum is also a local minimum, one is guaranteed to find it for sufficiently large numbers of number_of_generations. These geometry (local) optimizations comprise nearly all of the computational work during crystal structure prediction.

Note

One key difference between a genetic algorithm and a random search is that the selection of the parent structure(s), on which the genetic operations are performed, is biased by the fitness of that structure. The higher the fitness (i.e. the lower the calculated enthalpy in this implementation), the more likely it is to pick this structure. The skewness of the probability distribution is altered by selection_pressure. A large value of selection_pressure makes it less likely to select a configuration with a lower fitness.

## Restart support¶

A crystal structure prediction run can be restarted / resumed from a previous point in the following manner:

CSP = CrystalStructurePrediction(
...
)

# Save the current state of the CSP object.
nlsave('my_file.hdf5', CSP, object_id='my_run')

...

# Read back a stored CSP object. The last population is restored
# and all given properties / attributes are set up.

# Run 10 more iterations of the genetic algorithm.
CSP.evolvePopulation(10)

# Run 5 more iterations of the genetic algorithm.
CSP.evolvePopulation(5)


Note

At this time it is not possible to resume runs that use a custom fitness function.

## Analyzing Results¶

If write_population is True then after each step of the genetic algorithm the current population will be written to disk. The filename will start with log_filename_prefix and include the current generation number. For example, the initial population will be saved as generation_0.hdf5 and after the first step a new file generation_1.hdf5 will be saved. The crystal structures contained in these files can be viewed in QuantumATK using the Viewer.