GridValuesModelTrainer

Included in QATK.MLDFT

class GridValuesModelTrainer(grid_values_model, grid_values_dataset, grid_values_model_evaluation=None, target_mae=None, max_steps=None, batch_size=None, learning_rate_base=None, scheduler_class=None, scheduler_params=None, criterion=None, validation_interval=None, gpu_acceleration=None)

Class for training a model for predicting GridValues analysis objects, such as ElectronDifferenceDensity or EffectivePotential. The training process optimizes the model’s parameters (weights, biases) and when finished, the output model files are saved in the model directory specified in the grid_values_model object.

Parameters:
  • grid_values_model – The grid values model object. For continuing training of an existing model, it is sufficient to specify the path to the pre-trained model directory.

  • grid_values_dataset – The dataset object used for training. The dataset object must have the following methods: - setModelParameters that sets the model parameters (cutoff, num_components) for the dataset - trainingDataLoader that returns pytorch DataLoader object for training data - validationDataLoader that returns pytorch DataLoader object for validation data

  • grid_values_model_evaluation – The evaluation object used for monitoring properties of grid values obtained from the full inference (on all grid points) on a small subset of the validation samples during the training, as opposed to validation that is performed on a subset of grid points only and for the full validation set.
    Default: No evaluation is performed.

  • target_mae – The target mean absolute error (MAE) for the training. The training will stop when the validation MAE is below this value.
    Default: 1e-4.

  • max_steps – The maximum number of training steps after which the training is terminated.
    Default: 100000000.

  • batch_size – The number of data entries treated as one batch for training.
    Default: 2.

  • learning_rate_base – The base learning rate (step size) used for the optimizer.
    Default: 0.0001.

  • scheduler_class – The learning rate scheduler class used for the optimizer.
    Default: torch.optim.lr_scheduler.LambdaLR.

  • scheduler_params – The dict of parameters used for the learning rate scheduler. The default uses the LambdaLR scheduler with the learning rate scheduler function \(\lambda\) used for the optimizer in \(\text{LR}_{\text{new}} = \text{LR}_{\text{base}} \times \lambda(\text{step})\).
    Default: {'lr_lambda': lambda step: 0.96 ** (step / 100000)}.

  • criterion – The loss function used for the training.
    Default: torch.nn.L1Loss().

  • validation_interval – The interval at which the validation and logging are performed.
    Default: 1000.

  • gpu_acceleration – Whether to use GPU acceleration. If set to Automatic, the trainer will use the GPU if available.
    Default: Automatic

batchSize()

Return the number of data entries treated as one batch for training.

criterion()

Return the loss function used for the training.

device()

Return the torch device (cude/cpu) used for training.

gpuAcceleration()

Return whether to use CUDA-supported GPU acceleration.

Returns:

The GPU acceleration flag.

Return type:

Enabled | Disabled | Automatic

gridValuesDataset()

Return the gird values dataset object.

gridValuesModel()

Return the grid values model object.

gridValuesModelEvaluation()

Return the grid values model evaluation object.

learningRateBase()

Return the base learning rate used for the optimizer.

maxSteps()

Return the maximum number of training steps.

schedulerClass()

Return the learning rate scheduler class used for the optimizer.

schedulerParams()

Return the dict of parameters used for the learning rate scheduler.

static setDeterministicMode(seed=42)

Set all random seeds and configure deterministic behavior for reproducible training.

Parameters:

seed – The random seed to use for all random number generators.

Note

Deterministic mode may impact training performance. Use only for debugging or benchmarking purposes.

targetMae()

Return the target mean absolute error.

train(load_model=None, continue_training=None)

Start or resume training of the grid values model.

Parameters:
  • load_model – In case we want to continue the training, specify the former model. It is sufficient to specify the path to the old model directory, the model hyperparameters will be loaded and copied to the actual model.

  • continue_training – If True, the training will continue from the last checkpoint of the model and use the best val_mae as the starting point. If False, the training will start the training from the loaded model, but the best val_mae will be reset. If load_model is None, this parameter is ignored.

validationInterval()

Return the interval at which the validation and logging are performed.

Notes

Using a GridValuesPredictor with a well-trained model can significantly reduce the number of SCF steps required for convergence in DFT calculations. In some cases it is even possible to completely avoid SCF iterations, leading to substantial computational savings.

QuantumATK provides pre-trained GridValuesModels for both DensityPredictor and EffectivePotentialPredictor.

It is also possible to create custom GridValuesModels using the GridValuesModelTrainer class. When training a model, it is important to ensure that the training dataset is representative of the systems for which the model will be applied. This includes considering variations in atomic configurations, chemical compositions, and potentially external conditions such as pressure and temperature.

The general workflow for training and using a custom GridValuesModel is as follows:

  • Problem Definition: Define the specific problem you want to address with the GridValuesModel.

    • This could e.g. be accelerating SCF convergence for a particular class of materials or configurations, where SCF calculations are computationally expensive. Random alloys, disordereed systems, or large supercells are examples of such systems.

  • Model Selection: Choose between a Density model or an EffectivePotential model based on the characteristics of your target systems.

    • If the target systems incudes only metals, an EffectivePotential model may be more suitable than a Density model, since any long-range dipole effects are less relevant in metals and a local model can be expected to perform better.

    • On the other hand, if the target systems are semiconductors or insulators with significant charge transfer or dipole effects, a Density model may be more appropriate since the long-range electrostatic effects can be better captured.

  • Data Collection: Gather a diverse and representative dataset of DFT calculations relevant to your problem.

    • This dataset should include a variety of atomic configurations, chemical compositions, and potentially different external conditions (e.g., pressure, temperature) that are relevant to your target systems.

    • Generate the training data by performing standard DFT calculations on these systems to obtain accurate electron difference densities or effective potentials.

    • Collect the the ElectronDifferenceDensity or EffectivePotential objects in a single or multiple hdf5 files in a single directory or multiple directories.

    • Optionally, the ElectronDifferenceDensity or EffectivePotential analysis objects can be compressed to save disk space using the CompressedGridValues class.

  • Model Training: Use the GridValuesModelTrainer to train your model on the collected dataset.

    • Configure the training parameters and run the training process.

    • Monitor the training process to ensure convergence and avoid overfitting.

Usage Examples

Train a GridValuesModel using default training parameters:

from QATK.MLDFT import *

model_dir = '/densitymodel/default_params/'

model = DeepDFTModel(
    model_dir=model_dir
)

# Define dataset directories containing HDF5 files with training data
dataset_dirs=[
    "/directory/with/hdf5/files/for/training1/",
    "/directory/with/hdf5/files/for/training2/",
]

# Create the dataset
dataset = GridValuesDataset(
    dataset_dirs=dataset_dirs,
)

# Setup the trainer
trainer = GridValuesModelTrainer(
    grid_values_model=model,
    grid_values_dataset=dataset,
)

# Start the training process
trainer.train()

training_example_default.py

Train a GridValuesModel using custom training parameters:

import torch

from QATK.MLDFT import *

model_dir = '/densitymodel/cutoff_4_model_3_128/'

# Model parameters
model = DeepDFTModel(
    model_dir=model_dir,
    model_type=DeepDFTModelType.PaiNN,
    cutoff=4.0,
    num_interactions=3,
    node_size=128,
    num_components=1
)

# Define dataset directories containing HDF5 files with training data
dataset_dirs=[
    "/directory/with/hdf5/files/for/training1/",
    "/directory/with/hdf5/files/for/training2/",
]

# Create the dataset
dataset = GridValuesDataset(
    dataset_dirs=dataset_dirs,
    probe_count_train=1000,
    probe_count_val=5000,
    validation_ratio=0.05,
    seed=123456,
)

criterion = torch.nn.L1Loss()

# Setup the trainer
trainer = GridValuesModelTrainer(
    target_mae=1e-5,
    grid_values_model=model,
    grid_values_dataset=dataset,
    max_steps=int(1e6),
    batch_size=2,
    learning_rate_base=None,
    criterion=criterion,
    validation_interval=5000,
    gpu_acceleration=Automatic,
)

# Start the training process
trainer.train()

training_example.py

In the above examples, replace the dataset_dirs variable with the paths to your dataset directories containing HDF5 files with training data.

Using the trained GridValuesModel in DFT calculations

After training the GridValuesModel, you can use it in DFT calculations to accelerate SCF convergence. Here is an example of how to use the trained model in a DFT calculation:

model = GridValuesModel(
    model_dir='/densitymodel/cutoff_4_model_3_128/'
)

grid_values_predictor = DensityPredictor(
    model=model,
)

algorithm_parameters = AlgorithmParameters(grid_values_predictor=grid_values_predictor)

calculator = LCAOCalculator(
    algorithm_parameters=algorithm_parameters,
)

use_trained_model.py

In the above example, replace the model_dir variable with the path to your trained model directory.

Minimal workflow example

Here we provide a minimal example of the complete workflow from data collection to model training and usage: example_training_flow.hdf5. The workflow is as follows:

  1. Data Collection: Perform DFT calculations on a set of representative systems and calculate ElectronDifferenceDensity objects. Save these objects in an HDF5 file in a directory named data/ as specified in the Set data directory custom block. For tutorial purposes, we only include a very limited number of training samples and the calculations are performed with a small SingleZetaPolarized basis set. For a real application, larger and more diverse dataset is recommended together with a Medium or High basis set.

  2. Model Training: Use the GridValuesModelTrainer to train a GridValuesModel using the collected data. The training parameters are specified in the GridValuesModelTrainer block. For tutorial purposes, we use a limited number of training steps. For a real application, a larger number of training steps and careful monitoring of the training process is recommended.

  3. Model Usage: Use the trained GridValuesModel in a DFT calculation to accelerate SCF convergence, and calculate the Bandstructure of the system using both self-consistent DFT and the ML-accelerated non-selfconsistent DFT. The results can be compared to assess the performance of the trained model.