Parallelisation

VortexPasta.jl can take advantage of thread-based CPU parallelisation (equivalent to OpenMP). In particular, it can be run on a single node of a computing cluster.

Starting Julia with multiple threads

There are a few ways of starting Julia with multiple threads:

  1. either via the JULIA_NUM_THREADS environment variable,

  2. or via the -t / --threads command-line option.

If both are used, the second takes precedence.

The command-line flag can be used in two ways:

$ julia -t 8     # start Julia with 8 threads
$ julia -t auto  # start Julia with the number of CPUs available to this Julia process

The second option can be useful in particular when running SLURM jobs, as Julia will use the number of CPUs associated to the SLURM allocation.

Pinning threads

When Julia is started with multiple threads, it can (and often does) assign more than one thread to the same CPU, even when the requested number of threads is smaller or equal to the total number of CPUs. This is clearly suboptimal.

The ThreadPinning.jl package solves this issue. The easiest way to use it by putting the following lines at the top of your Julia script:

using ThreadPinning
pinthreads(:cores)

Note that this should be changed when using SLURM.

One can then check that threads are correctly pinned to separate CPUs using threadinfo.

Running SLURM jobs

Submitting jobs

Here is a sample SLURM script for running a simulation on which can be submitted using sbatch:

#!/bin/bash

#SBATCH --job-name="JOB_NAME"
#SBATCH --partition=PARTITION_NAME_ON_CLUSTER
#SBATCH --time=1:00:00
#SBATCH --distribution=block:block
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --hint=nomultithread
#SBATCH --cpus-per-task=32
#SBATCH --exclusive
#SBATCH --mem=120G
#SBATCH --threads-per-core=1

echo " - SLURM_JOB_ID = $SLURM_JOB_ID"
echo " - SLURM_JOB_NODELIST = $SLURM_JOB_NODELIST"
echo " - SLURM_TASKS_PER_NODE = $SLURM_TASKS_PER_NODE"
echo " - SLURM_CPUS_PER_TASK = $SLURM_CPUS_PER_TASK"

srun --cpu-bind=verbose,cores julia -t auto --heap-size-hint=100G --project=. script.jl

Here the number of threads is set via the --cpus-per-task option.

Some other notes:

  • the --exclusive flag is optional; it is recommended if one wants to use a full node (i.e. if the number of requested CPUs corresponds to the total number of CPUs on a node);

  • above, we passed the --heap-size-hint=100G option to the julia command. This may help avoid out of memory errors, by telling Julia's garbage collector that the total used memory should stay below the specified value. Just to be sure, we also explicitly requested a (slightly larger) amount of memory to SLURM using the --mem option;

  • the --hint=nomultithread option tells SLURM to avoid using hyperthreading, which is generally not good for performance.

Pinning SLURM threads

As mentioned above, it is a good idea to pin Julia threads to the CPUs available to the Julia process.

When using SLURM, one can achieve this by using the :affinitymask criterion in ThreadPinning's pinthreads:

using ThreadPinning
pinthreads(:affinitymask)

It can be convenient to have a Julia script which does the Right Thing (TM) depending on whether it runs within a SLURM job or not. To achieve this, one can do:

using ThreadPinning

if haskey(ENV, "SLURM_JOB_ID")
    pinthreads(:affinitymask)
else
    pinthreads(:cores)
end

Using MKL FFT routines

The default NonuniformFFTsBackend in VortexPasta.jl computes threaded FFTs using the FFTW.jl package, which by default wraps the FFTW libraries written in C. However, FFTW.jl has an unresolved issue which can be encountered (somewhat randomly) when computing FFTs using a large number of threads.

Switching to MKL in FFTW.jl

One workaround is to switch to the FFT implementation in Intel's MKL libraries, which don't seem to display this issue. The FFTW.jl package makes it easy to switch to the MKL implementation via their FFTW interface.

One simply needs to do:

using FFTW
FFTW.set_provider!("mkl")

and restart Julia. This will create (or update) a LocalPreferences.toml file next to the Project.toml file associated to the active Julia project.

Correctly using threads with MKL

The above change is not enough if one wants MKL's FFTs to be efficient when using threads. One also needs to set the following environment variables (for example in a SLURM script):

export MKL_NUM_THREADS=$SLURM_CPUS_PER_TASK      # on SLURM
export MKL_NUM_THREADS=$NUMBER_OF_JULIA_THREADS  # in general
export MKL_DYNAMIC=false

The MKL_DYNAMIC=false option tells MKL not to mess around with thread pinning.

Secondly, one also needs to add:

using MKL

at the start of the Julia script to be run (one may need to ]add MKL first). Failing to do this can really degrade performance.

Note that one can also set the environment variables directly in the Julia script. Including thread pinning, the beginning of the Julia script could look like:

using MKL
using ThreadPinning

ENV["MKL_NUM_THREADS"] = Threads.nthreads()  # same as number of Julia threads
ENV["MKL_DYNAMIC"] = false

if haskey(ENV, "SLURM_JOB_ID")
    pinthreads(:affinitymask)
else
    pinthreads(:cores)
end

# Tell threadinfo to give us information about BLAS (and MKL) and optionally about the SLURM set-up.
threadinfo(blas = true, hints = true, slurm = haskey(ENV, "SLURM_JOB_ID"))

Note that, with the hints = true option, ThreadPinning will complain about our choice of using MKL_NUM_THREADS = number_of_julia_threads. This warning can be ignored, since FFTs are executed from a single Julia thread and it's therefore what we want.