Pencil configurations

A pencil configuration refers to a given distribution of multidimensional data among MPI processes. This information is encoded in the Pencil type.

A pencil configuration includes:

MPI topology information,
global and local dimensions of the numerical grid,
subset of decomposed dimensions,
definition of optional permutation of dimensions.

Construction

High-level interface

The simplest way of constructing a new Pencil is by passing the global dataset dimensions and an MPI communicator to the Pencil constructor:

julia> dims_global = (16, 32, 64);

julia> comm = MPI.COMM_WORLD;

julia> pen = Pencil(dims_global, comm)
Decomposition of 3D data
    Data dimensions: (16, 32, 64)
    Decomposed dimensions: (2, 3)
    Data permutation: NoPermutation()

This will decompose the data along the two last dimensions of the dataset:^[1]

julia> decomposition(pen)
(2, 3)

For instance, if the communicator comm has 4 MPI processes, then each process will hold a subset of data of size (16, 16, 32):

julia> topology(pen)
MPI topology: 2D decomposition (2×2 processes)  # assuming MPI.Comm_size(comm) == 4

julia> size_local(pen)
(16, 16, 32)

Instead of the default, one may want to choose the subset of dimensions that should be decomposed. For instance, to decompose along the first dimension only:

julia> decomp_dims = (1,);

julia> pen = Pencil(dims_global, decomp_dims, comm)
Decomposition of 3D data
    Data dimensions: (16, 32, 64)
    Decomposed dimensions: (1,)
    Data permutation: NoPermutation()

julia> decomposition(pen)
(1,)

julia> topology(pen)
MPI topology: 1D decomposition (4 processes)  # again, assuming 4 MPI processes

Note that the above high-level constructors don't require the definition of a MPITopology, which is constructed implicitly. For more control, one may want to manually construct a MPITopology, and then construct a Pencil from that. As above, one may also specify the list of decomposed dimensions.

For instance, we may want to decompose 32 MPI processes into a 8×4 Cartesian topology. This is done as follows:

julia> topo = MPITopology(comm, (8, 4))  # NOTE: fails if MPI.Comm_size(comm) ≠ 32
MPI topology: 2D decomposition (8×4 processes)

julia> dims_global = (16, 32, 64);

julia> pen = Pencil(topo, dims_global)
Decomposition of 3D data
    Data dimensions: (16, 32, 64)
    Decomposed dimensions: (2, 3)
    Data permutation: NoPermutation()

As before, the decomposed dimensions are the rightmost ones by default (in this case, dimensions 2 and 3). A different set of dimensions may be selected via an optional positional argument. For instance, to decompose along dimensions 1 and 3 instead:

julia> decomp_dims = (1, 3);

julia> pen = Pencil(topo, dims_global, decomp_dims)
Decomposition of 3D data
    Data dimensions: (16, 32, 64)
    Decomposed dimensions: (1, 3)
    Data permutation: NoPermutation()

Defining multiple pencils

One may also want to work with multiple pencil configurations that differ, for instance, on the selection of decomposed dimensions. For this case, a constructor is available that takes an already existing Pencil instance. Calling this constructor should be preferred when possible since it allows sharing memory buffers (used for instance for global transpositions) and thus reducing memory usage. The following creates a Pencil equivalent to the one above, but with different decomposed dimensions:

julia> pen_y = Pencil(pen; decomp_dims = (1, 3))
Decomposition of 3D data
    Data dimensions: (16, 32, 64)
    Decomposed dimensions: (1, 3)
    Data permutation: NoPermutation()

See the Pencil documentation for more details.

Dimension permutations

A Pencil may optionally be given information on dimension permutations. In this case, the layout of the data arrays in memory is different from the logical order of dimensions. For performance reasons, permutations are compile-time objects defined in the StaticPermutations package.

To make permutations clearer, consider the example above where the global data dimensions are $N_x × N_y × N_z = 16 × 32 × 64$. In this case, the logical order is $(x, y, z)$. Now let's say that we want the memory order of the data to be $(y, z, x)$,^[2] which corresponds to the permutation (2, 3, 1).

Permutations are passed to the Pencil constructor via the permute keyword argument. Dimension permutations should be specified using a Permutation object. For instance,

julia> perm = Permutation(2, 3, 1);

julia> pen = Pencil(dims_global, comm; permute = perm)
Decomposition of 3D data
    Data dimensions: (16, 32, 64)
    Decomposed dimensions: (2, 3)
    Data permutation: Permutation(2, 3, 1)

One can also pass a NoPermutation to disable permutations (this is the default).

Types

PencilArrays.Pencils.Pencil — Type

Pencil{N,M}

Describes the decomposition of an N-dimensional array among MPI processes along M directions (with M < N).

Pencil(
    [A = Array],
    topology::MPITopology{M}, size_global::Dims{N},
    decomp_dims::Dims{M} = default_decomposition(N, Val(M));
    permute::AbstractPermutation = NoPermutation(),
    timer = TimerOutput(),
)

Define the decomposition of an N-dimensional geometry along M dimensions.

The dimensions of the geometry are given by size_global = (N1, N2, ...). The Pencil describes the decomposition of an array of dimensions size_global across a group of MPI processes.

Data is distributed over the given M-dimensional MPI topology (with M ≤ N).

The decomposed dimensions may optionally be provided via the decomp_dims argument. By default, the M rightmost dimensions are decomposed. For instance, for a 2D decomposition of 5D data (M = 2 and N = 5), the dimensions (4, 5) are decomposed by default.

It is also possible to distribute over all dimensions (M = N). Note that, in this specific case, transpositions are currently not possible.

The optional argument A allows to work with arrays other than the base Array type. In particular, this should be useful for working with GPU array types such as CuArray.

The optional permute parameter may be used to indicate a permutation of the data indices from logical order (the order in which the arrays are accessed in code) to memory order (the actual order of indices in memory). Permutations must be specified using the exported Permutation type, as in permute = Permutation(3, 1, 2).

It is also possible to pass a TimerOutput to the constructor. See Measuring performance for details.

Examples

Decompose a 3D geometry of global dimensions $N_x × N_y × N_z = 4×8×12$ along the second ($y$) and third ($z$) dimensions:

julia> topo = MPITopology(MPI.COMM_WORLD, Val(2));

julia> Pencil(topo, (4, 8, 12), (2, 3))
Decomposition of 3D data
    Data dimensions: (4, 8, 12)
    Decomposed dimensions: (2, 3)
    Data permutation: NoPermutation()
    Array type: Array

julia> Pencil(topo, (4, 8, 12), (2, 3); permute = Permutation(3, 2, 1))
Decomposition of 3D data
    Data dimensions: (4, 8, 12)
    Decomposed dimensions: (2, 3)
    Data permutation: Permutation(3, 2, 1)
    Array type: Array

In the second case, the actual data is stored in (z, y, x) order within each MPI process.

Pencil([A = Array], size_global::Dims{N}, [decomp_dims = (2, …, N)], comm::MPI.Comm; kws...)

Convenience constructor that implicitly creates a MPITopology.

The number of decomposed dimensions specified by decomp_dims must be M < N. If decomp_dims is not passed, dimensions 2:N are decomposed.

Keyword arguments are passed to alternative constructor taking an MPITopology. That constructor should be used if more control is desired.

Examples

julia> Pencil((4, 8, 12), MPI.COMM_WORLD)
Decomposition of 3D data
    Data dimensions: (4, 8, 12)
    Decomposed dimensions: (2, 3)
    Data permutation: NoPermutation()
    Array type: Array

julia> Pencil((4, 8, 12), (1, ), MPI.COMM_WORLD)
Decomposition of 3D data
    Data dimensions: (4, 8, 12)
    Decomposed dimensions: (1,)
    Data permutation: NoPermutation()
    Array type: Array

Pencil(
    [A = Array],
    p::Pencil{N,M};
    decomp_dims::Dims{M} = decomposition(p),
    size_global::Dims{N} = size_global(p),
    permute::P = permutation(p),
    timer::TimerOutput = timer(p),
)

Create new pencil configuration from an existent one.

This constructor enables sharing temporary data buffers between the two pencil configurations, leading to reduced global memory usage.