FragmentCalculator

class FragmentCalculator(full_system_calculator, fragment_generator, number_of_processes_per_fragment=None, use_global_fermi_level=None, workload_partitioning_algorithm=None)

Class for representing calculations using the divide-and-conquer linear scaling fragment method with a given atomic scale calculator.

Parameters:
  • full_system_calculator (LCAOCalculator) – The calculator that each of the fragment configurations will base their calculator on.

  • fragment_generator (FragmentGenerator) – The object that describes how the fragments should be created and is responsible for creating them.

  • number_of_processes_per_fragment (positive int) – The number of processes that will be used in the calculation of each fragment. If the total number of processes does not divide evenly into the number of fragments, some fragment process groups will have less than this number of processes.
    Default: 1

  • use_global_fermi_level (bool) – Boolean controlling if a global Fermi level should be used in the calculation. This should be True for calculations involving metals or interface with an expected charge transfer. For pure semi-conductor systems, setting it to False can improve convergence and performance.

  • workload_partitioning_algorithm (KarmarkarKarpAlgorithm | GreedyAlgorithm | CompleteGreedyAlgorithm | Interleaved | tuple) – The algorithm to partition the fragments by their estimated workload into the process groups such that the sum of workloads in each process group is as equal as possible. One may alternatively provide a tuple with the partitioning algorithm in the first entry and an orbital power (positive integer) in the second entry. The orbital power is the power that the number of orbitals of a fragment is raised to when calculating the fragment weight.
    Default: (Interleaved, 3)

fragmentGenerator()
Returns:

The object that describes how the fragments are created and is responsible for creating them.

Return type:

FragmentGenerator

fullSystemCalculator()
Returns:

The calculator for the full system.

Return type:

LCAOCalculator

isConverged()
Returns:

True when the call to “update()” resulted in a converged SCF loop.

Return type:

bool

iterationControlParameters()

Query method for the IterationControlParameters.

Returns:

The iteration control parameters set on the calculator.

Return type:

IterationControlParameters

metatext()
Returns:

The metatext of the object or None if no metatext is present.

Return type:

str | None

numberOfProcessesPerFragment()
Returns:

The number of processes that is used in the calculation of each fragment. If the total number of processes does not divide evenly into the number of fragments, some fragment process groups will have less than this number of processes. The number of processes per fragment is capped at the number of processes available.

Return type:

int

requestedNumberOfProcessesPerFragment()
Returns:

The number of processes asked for (on construction of the calculator) to be used in the calculation of each fragment. This number can be higher than the number of processes available and will be tried to be honored when a calculator is read in again from file and the number of processes per fragment should be re-evaluated.

Return type:

int

setMetatext(metatext)

Set a given metatext string on the object.

Parameters:

metatext (str | None) – The metatext string that should be set. A value of “None” can be given to remove the current metatext.

uniqueString()

Return a unique string representing the state of the object.

upgrade(full_system_configuration)

Upgrade the full system calculator with defaults matching the configuration and create the fragment configurations for the given configuration.

Parameters:

full_system_configuration (BulkConfiguration) – The configuration to upgrade the calculator defaults for and the configuration to create fragment configurations for.

useGlobalFermiLevel()
Returns:

Boolean controlling if a global Fermi level should be used in the calculation.

Return type:

bool

versionUsed()
Returns:

The version of ATK used to update the calculator.

Return type:

str

workloadPartitioningAlgorithm()
Parameters:

workload_partitioning_algorithm (KarmarkarKarpAlgorithm | GreedyAlgorithm | CompleteGreedyAlgorithm | Interleaved) – The algorithm to partition the tasks by their estimated workload into the process groups such that the sum of workloads in each process group is as equal as possible.
Default: KarmarkarKarpAlgorithm

Notes

The FragmentCalculator is a linear scaling method based on the LCAOCalculator. The FragmentCalculator can be used to simulate large systems with more than 10,000 atoms using DFT. The method shows good parallel scaling with respect to both computation time and memory reduction.

The underlying method implemented in the FragmentCalculator is based on the work of Lin-Wang Wang et. al, [1], [2]. The QuantumATK implementation largely follows the concepts and ideas presented in these articles. The main difference is that while the work of Wang. et al is based on a plane-wave method, the QuantumATK implementation is currently implemented for LCAOCalculations.

Below we will present the main concepts behind the FragmentCalculator and highlight differences with respect to the original works.

Scaling with system size

The computation time for an SCF step of any traditional DFT method (LCAO and plane-wave) scales as \(t_{scf}^{DFT} \propto O(N^3)\), where \(N\) is the number of atoms in the system. This is due to the expensive calculation of eigenstates when solving the Kohn-Sham eigenvalue problem.

The main idea behind the FragmentCalculator method is to divide a large system into a number smaller systems (fragments). The number of fragments, \(K_f = k \cdot N\) will be proportional to the total number of atoms, with a proportionality factor that depends on the specific type of fragmentation strategy. Denoting the size of the small, fragment, systems as \(N_f\), the time for solving the eigenvalue problem for the \(K_f\) fragments scales as \(t_{scf}^{Fragment} \propto k \cdot N \cdot O(N_f^3)\). Since the fragment size \(N_f\) is constant we see that the calculation time scales linearly with system size \(N\) through the number of fragments.

Equally import to the scaling of computation time is the scaling of the required memory with system size. For LCAO, the memory required for storing the eigenvectors scales as \(O(N^2)\), since each eigenvector grows linearly with system size, and a linearly increasing number of eigenstates are required to construct the density. For the FragmentCalculator the required memory scales linearly with system size, since the memory required for the eigenstates is constant for an individual fragment, while the number of fragments scale linearly with system size. The memory required for the real-space density and effective potential scales linear with system size for both normal LCAO and the fragment method. It should be noted, that the fragment method has a significant memory overhead for small systems.

The linear scaling of both the calculation time and required memory implies that very large systems can be studied with the FragmentCalculator, provided that the number of processors is increased proportionally.

Division into fragments

Following Refs. [1], [2] we divide the original, big system into a number of fragments. The actual fragmentation is defined using a FragmentGenerator. The FragmentGenerator must be defined in connection with the FragmentCalculator. and both classes are dependent on each other. The FragmentGenerator provides three mutually exclusive options for fragmentation:

  1. fractional_cuts

  2. number_of_minimal_fragments

  3. fragment_lengths

More details about fragmentation is provided here: FragmentGenerator.

The fragmentation specified by one of the three options defines how to cut out the minimal fragments. In addition to the interior of a fragment, one can add a buffer region to fragment. The buffer region is included in the fragment Kohn-Sham equation, but is not included when calculating the electron density. Including 1-3 Å of buffer region usually improves the quality of the fragment calculation of both total energy and forces.

When cutting out a fragment, potentially including buffer atoms, a vacuum region is added in the directions of fragmentation. The boundary atoms facing vacuum are passivated with hydrogen atoms positioned at the center of the cut bonds.

Following Refs. [1], [2] the fragments have different signs, either +1 or -1, depending on their type. The total electron density in a given point is calculated as

\[\rho(\mathbf{r})=\sum_F\alpha_F\rho_F(\mathbf{r})\]

where the sum runs over fragments covering the point \(\mathbf{r}\) and where the fragment sign \(\alpha_F\) is either +1 or -1. The fragment density \(\rho_F(\mathbf{r})\) is obtained from solving a (small) fragment Kohn-Sham equation:

\[\left[-\frac{1}{2}\nabla^2 + V_{eff}(\mathbf{r}) + V_{F,p}(\mathbf{r})\right]\psi_{F,i}(\mathbf{r}) = \epsilon_{F,i} \psi_{F,i}(\mathbf{r})\]

where \(V_{eff}(\mathbf{r})\) is the usual effective potential and \(V_{F,p}(\mathbf{r})\) is a passivation potential taken as the difference between the neutral atom potentials of the fragment configuration and the full configuration.

1D, 2D, and 3D fragmentation and scaling considerations

The FragmentGenerator and FragmentCalculator can be used to divide the initial configuration into 1D, 2D or 3D fragments depending on the input to FragmentGenerator as described in the FragmentGenerator manual page.

The performance improvement of the FragmentCalculator compared to normal LCAO depends on the fragmentation and weather it is 1D, 2D, or 3D fragments.

For 1D fragmentation, the fragments generated will either be single fragments, denoted (1,1,1) or double fragments, denoted (1,1,2) for fragmentation along the C-direction. The total number of fragments is thus \(N_f = 2\cdot N_{single}\), where \(N_{single}\) is the number of single fragments.

Consider now a system consisting of a primitive unit cell repeated \(N_C\) times along the C-direction. The number of orbitals in the primitive cell is denoted \(P\). For the fragment calculation, we divide this system into \(N_f=N_C\) number of minimal fragments. If we only take into account the \(O(N^3)\) cost of solving the Kohn-Sham eigenvalue problem, the idealized speedup of the fragment method compared to normal LCAO is

\[S_{1D} = \frac{T_{fragment}}{T_{LCAO}} = \frac{N_C \left(P^3 + (2P)^3\right)}{(N_C P)^3} = \frac{9}{N_C^2}\]

It is thus evident that for less than 3 minimal fragments, the fragment calculation will always be slower than a normal LCAO calculation. In the ideal case, the fragment calculation will be faster than normal LCAO for \(N_C>3\). In practice, the cross-over point will often be observed at somewhat larger \(N_C\sim 5-6\) due to other computational costs related to real-space grid operations and other overhead.

For 2D fragmentation along B and C directions, the fragments are (1,1,1), (1,2,1), (1,1,2), and (1,2,2). For each single fragment there will be 4 fragments in total. The idealized speedup for a primitive configuration repeated \(N_B\) and \(N_C\) times along the B and C directions is in this case

\[S_{2D} = \frac{T_{fragment}}{T_{LCAO}} = \frac{N_B N_C \left(\cdot P^3 + 2 \cdot (2P)^3 + (4P)^3\right)}{(N_B N_C P)^3} = \frac{81}{N_B^2 N_C^2}\]

Again, we see an idealized break-even is seen for three repetitions along each of the direction, \(N_B=N_C=3\).

For 3D fragmentation the fragments are (1,1,1), (2,1,1), (1,2,1), (1,1,2), (2,2,1), (2,1,2), (1,2,2), (2,2,2) i.e. 8 fragments in total for each single fragment. A 5x5x5 fragmentation thus leads to 1000 fragments in total. The idealized speedup for a primitive configuration repeated \(N_A\), \(N_B\) and \(N_C\) times along the A, B and C directions is in this case

\[S_{3D} = \frac{T_{fragment}}{T_{LCAO}} = \frac{N_A N_B N_C \left(\cdot P^3 + 3 \cdot (2P)^3 + 3(4P)^3 + (8P)^3\right)}{(N_A N_B N_C P)^3} = \frac{729}{N_A^2N_B^2 N_C^2}\]

Again, we see an idealized break-even is seen for three repetitions along each of the direction, \(N_A=N_B=N_C=3\), but in practice the fragment method will be faster at a higher number of repetitions.

Load imbalance of the process groups

When going to higher dimension fragmentation the size difference between the smallest fragment and the largest fragment becomes quite significant. Most prominently for 3D fragmentation the size difference between the (1,1,1) fragment and its (2,2,2) fragment is a factor of 8. If one at the same time have many process groups as a result of having many processes, the number of fragments per process group is small and it will be increasingly important to partition the fragments optimally between the process group. A bad parallel scaling with increasing number of processes might be a sign that the used partition of fragments for the process groups leads to a bad load balance. To try to improve the workload partition of the process groups one can change the workload_partitioning_algorithm from Interleaved to KarmarkarKarpAlgorithm. The number partitioning algorithms Karmarkar-Karp, Greedy and complete Greedy are described in [3]. The complete Greedy algorithm will find the most optimal partition of the fragments, but it is not recommended to use since it is very slow and scales bad.

Other numerical settings

Fragment method calculations generally use more SCF step to converge than a similar LCAO calculation. In order to achieve stable convergence we recommend:

  • Decrease the damping_factor in IterationControlParameters to 0.05 or below.

  • Decrease the broadening used in the occupation_method in :ref:NumericalAccuracyParameters_c` from default 1000 Kelvin to 300 Kelvin or below.

Supported analysis

Not all analysis objects are supported by the FragmentCalculator. At present, the following objects are available:

  • TotalEnergy

  • Forces

  • Stress

  • OptimizeGeometry

  • MolecularDynamics

  • ElectronDensity

  • ElectronDifferenceDensity

  • HartreeDifferencePotential

  • DynamicalMatrix

    • PhononBandstructure

    • PhononDensityOfStates

    • VibrationalMode

References