CompressedGridValues¶
Included in QATK.Analysis
- class CompressedGridValues(analysis, compression_tolerance=None, calculate_errors=None, seed=None)¶
Initialize the CompressedGridValues object from an analysis object.
- Parameters:
analysis (
AnalysisSpinor subclass thereof) – The analysis object containing GridValues data to be compressed.compression_tolerance (float) – The relative unitless tolerance for the compression determining how much of the Frobenius norm of the original tensor is retained, i.e. the expected (ideal) relative compression errors should be ~ tolerance. It must be a positive float. Default:
1e-2calculate_errors (bool) – If True, calculate the errors of the compression. Default:
Falseseed (int) – The random seed for the random state of the Tucker decomposition. Default:
12345
- analysisClass()¶
Return the type of the analysis object that was used for creating this compressed grid values object.
- static compressionErrors(original_tensor, tucker_tensor)¶
Calculate the compression errors by comparing the reconstructed tensor with the original full tensor.
- Parameters:
original_tensor – The original tensor before compression.
tucker_tensor – Core and 3 factors of the Tucker decomposition.
- Returns:
Dict of reconstruction errors for the given Tucker tensor.
- compressionRatio()¶
Calculate the compression ratio of the Tucker decomposition.
- Returns:
The compression ratio, defined as the number of elements in the original tensor divided by the number of elements in the Tucker tensors.
- Return type:
float
- compressionTolerance()¶
Return the relative tolerance for the compression, which determines how much of the Frobenius norm of the original tensor is retained.
- Returns:
The relative compression tolerance.
- Return type:
float
- decompressToAnalysis()¶
Decompress the compressed grid values back to an AnalysisSpin object.
- Returns:
The decompressed AnalysisSpin object.
- Return type:
AnalysisSpin
- static estimateRankFromSpectrum(spectrum, tolerance)¶
Returns the minimal rank r such that the sum of the spectrum^2 is retained up the given tolerance.
- Parameters:
spectrum – The singular value spectrum of the unfolded tensor.
tolerance – The relative (dimensionless) tolerance for the spectral sum retention.
- static estimateTuckerRanks(tensor, tolerance=0.01)¶
Estimate the ranks for a Tucker decomposition using its SVD spectrum. This is done so that the Frobenius norm of the tensor is conserved up to the given relative tolerance.
- Parameters:
tensor – The tensor to be decomposed.
tolerance – The relative (dimensionless) tolerance for the spectrum retention.
- metatext()¶
- Returns:
The metatext of the object or None if no metatext is present.
- Return type:
str | None
- nlinfo()¶
- Returns:
Structured information about the CompressedGridValues.
- Return type:
dict
- nlprint(stream=None)¶
Print a string containing an ASCII table useful for plotting the AnalysisSpin object.
- Parameters:
stream (python stream) – The stream the table should be written to. Default:
NLPrintLogger()
- primitiveVectors()¶
Return the primitive vectors of the configuration associated with this compressed grid values object.
- setMetatext(metatext)¶
Set a given metatext string on the object.
- Parameters:
metatext (str | None) – The metatext string to set. Use “None” to remove current metatext.
- shape()¶
Return the shape of the full grid data before compression - same for each spin component.
- Returns:
The shape of the full grid data.
- Return type:
tuple
- tuckerErrors()¶
Return the list of errors for Tucker tensors for each spin, or None if not calculated.
- tuckerRanks()¶
For each spin component, return the ranks of the Tucker decomposition, i.e. shape of the Tucker core tensor.
- Returns:
The ranks of the Tucker decomposition.
- Return type:
list of tuples
- tuckerTensors()¶
Return the list of Tucker core and factor tensors of the compressed grid values for each spin component.
- uniqueString()¶
Return a unique string representing the state of the object.
- unit()¶
Return the unit associated with this compressed grid values object.
Usage Examples¶
A GridValues object such as ElectronDifferenceDensity or EffectivePotential can be compressed using the CompressedGridValues class to save storage space.
from QATK.Analysis import *
from QATK.Calculators.DFT import *
from QATK.Core import *
vector_a = [0.0, 13.5765, 13.5765]*Angstrom
vector_b = [2.7153, 0.0, 2.7153]*Angstrom
vector_c = [2.7153, 2.7153, 0.0]*Angstrom
lattice = UnitCell(vector_a, vector_b, vector_c)
elements = [Silicon, Silicon, Silicon, Silicon, Silicon, Silicon, Silicon,
Silicon, Silicon, Silicon]
fractional_coordinates = [[ 0.8 , 0. , -0. ],
[ 0.85, 0.25, 0.25],
[ 0. , -0. , 0. ],
[ 0.05, 0.25, 0.25],
[ 0.2 , 0. , -0. ],
[ 0.25, 0.25, 0.25],
[ 0.4 , 0. , -0. ],
[ 0.45, 0.25, 0.25],
[ 0.6 , 0. , -0. ],
[ 0.65, 0.25, 0.25]]
silicon_alpha = BulkConfiguration(
bravais_lattice=lattice,
elements=elements,
fractional_coordinates=fractional_coordinates
)
k_point_sampling = KpointDensity(
density_a=4.0 * Angstrom, density_b=4.0 * Angstrom, density_c=4.0 * Angstrom
)
numerical_accuracy_parameters = NumericalAccuracyParameters(
k_point_sampling=k_point_sampling
)
calculator = LCAOCalculator(
numerical_accuracy_parameters=numerical_accuracy_parameters,
checkpoint_handler=NoCheckpointHandler,
)
silicon_alpha.setCalculator(calculator)
silicon_alpha.update()
electron_difference_density = ElectronDifferenceDensity(configuration=silicon_alpha)
electron_difference_density_compressed = CompressedGridValues(electron_difference_density)
compress_electron_difference_density.py
The resulting CompressedGridValues object can be saved to a file in the compressed format using nlsave:
nlsave("compressed_density.hdf5", electron_difference_density_compressed)
It is possible to inspect various compression statistics from the CompressedGridValues object:
print("Original shape:", electron_difference_density_compressed.shape())
print("Tucker ranks:", electron_difference_density_compressed.tuckerRanks())
print("Compression ratio: %.2f" % electron_difference_density_compressed.compressionRatio())
Which will output the following:
Original shape: (90, 18, 18)
Tucker ranks: [(11, 11, 11)]
Compression ratio: 10.73
From this we can see, that the original tensor of shape (90, 18, 18) has been compressed to (11, 11, 11), resulting in a compression ratio of 10.73, which is the reduced storage space when the object is saved to disk.
The original ElectronDifferenceDensity object can be restored from the compressed representation
using the decompressToAnalysis() method. If we execute the compress_electron_difference_density.py
script in the atkpython interpreter, we can then decompress the object as follows:
In [1]: electron_difference_density_compressed.decompressToAnalysis()
Out[1]: ElectronDifferenceDensity {Spin: All, Grid shape: (90, 18, 18)}
We can also use the optional arguments of the CompressedGridValues constructor to customize the compression, for instance, to control the compression tolerance and whether to calculate error metrics:
In [2]: electron_difference_density_compressed = CompressedGridValues(electron_difference_density, compression_tolerance=1e-1, calculate_errors=True)
In [3]: electron_difference_density_compressed.tuckerErrors()
Out[3]:
[{'max_abs_error': 0.0033359592645264536,
'mean_abs_error': 0.0004662198051305935,
'rms_abs_error': 0.0006252342209779552,
'median_abs_error': 0.0003357332129775054,
'relative_L1_norm': 0.1372401664681808,
'relative_L2_norm': 0.12027586165472676,
'max_relative_error': 193.5515133586285,
'mean_relative_error': 0.9688946195930522}]
Calling tuckerErrors() returns a dictionary for each spin component (in a list) containing various error metrics comparing the compressed and original data. The compression_tolerance parameter is correlated with the relative_L2_norm error metric, which represents how much (in percentage) of the original data is lost by the compression process.
Theoretical Notes¶
The compression of 3D grid data is performed using Tucker tensor decomposition. Given a rank 3 tensor \(T\) representing the grid values, the Tucker decomposition approximates it as:
where \(d_r\) \((r=1,2,3)\) are the desired dimensions of the target compressed tensor along each mode, \(\mathcal{T}\) is the core tensor of shape \((d_1, d_2, d_3)\), and \(U^{(r)}\) are the factor matrices of shape \((d_r, n_r)\) with \(n_r\) being the original dimensions of the tensor along each mode.
The dimensions \(d_r < n_r\) are determined automatically from the given compression tolerance parameter.
The compression tolerance parameter controls the trade-off between compression ratio and accuracy. A smaller tolerance preserves more accuracy but results in less compression. The default tolerance of 1e-2 typically provides a good balance between storage savings and reconstruction accuracy.
CompressedGridValues can directly be used in the QuantumATK NanoLab for visualizations, just like the original GridValues objects.
Compression can significantly reduce the storage space required for large datasets, which is particularly useful when collecting training data for machine learning models using the GridValuesModelTrainer class.
Currently, only ElectronDifferenceDensity and EffectivePotential objects are supported for using the CompressedGridValues class.