Running QuantumATK Under Cray MPI (aprun)

ATK is compiled against MPICH3 on Linux, but can be run on any platform compatible with that library structure, like Intel MPI and MVAPICH. It is also possible to run ATK under Cray MPI, but it requires some special configuration. Some of this is very system-specific, so the comments below may or may not be relevant in all cases, but they should provide enough hints for figuring out how to run QuantumATK properly under “aprun”.

The following points have been relevant on other Cray systems (the details regarding version numbers etc are of course system-specific).

  1. It is crucial to use an ABI-compatible MPI module. This can be achieved with something like the following lines:

    module unload PrgEnv-pgi PrgEnv-gnu PrgEnv-cray PrgEnv-intel
    module load PrgEnv-gnu
    module swap cray-mpich cray-mpich-abi/7.3.0
    
    On another system the following lines were used:
    
    module unload PrgEnv-cray
    module load PrgEnv-gnu
    module unload cray-mpich
    module load cray-mpich-abi
    
  2. aprun should be launched with the -b option:

  3. It is necessary to ensure that the Cray MPI libraries are loaded instead of the ones shipped with QuantumATK. One way to do this is

    export LD_LIBRARY_PATH="${CRAY_LD_LIBRARY_PATH}:${LD_LIBRARY_PATH}"
    
  4. Test the parallel setup by running a script across at least 2 hardware nodes, on multiple cores (we have seen cases where all works fine in parallel on a single node, but not across nodes). This can be checked from the output of the script examples/atkpython/test_mpi.py. It should print only one line “Master node” and several lines of “Slave node”.

  5. You should also test that a “real” calculation can be executed, which includes the generation of binary result files. The script examples/atkpython/nh3.py can for instance be used for this.