Pencil configurations
A pencil configuration refers to a given distribution of multidimensional data among MPI processes. This information is encoded in the Pencil
type.
A pencil configuration includes:
- MPI topology information,
- global and local dimensions of the numerical grid,
- subset of decomposed dimensions,
- definition of optional permutation of dimensions.
Construction
High-level interface
The simplest way of constructing a new Pencil
is by passing the global dataset dimensions and an MPI communicator to the Pencil
constructor:
julia> dims_global = (16, 32, 64);
julia> comm = MPI.COMM_WORLD;
julia> pen = Pencil(dims_global, comm)
Decomposition of 3D data
Data dimensions: (16, 32, 64)
Decomposed dimensions: (2, 3)
Data permutation: NoPermutation()
This will decompose the data along the two last dimensions of the dataset:[1]
julia> decomposition(pen)
(2, 3)
For instance, if the communicator comm
has 4 MPI processes, then each process will hold a subset of data of size (16, 16, 32)
:
julia> topology(pen)
MPI topology: 2D decomposition (2×2 processes) # assuming MPI.Comm_size(comm) == 4
julia> size_local(pen)
(16, 16, 32)
Instead of the default, one may want to choose the subset of dimensions that should be decomposed. For instance, to decompose along the first dimension only:
julia> decomp_dims = (1,);
julia> pen = Pencil(dims_global, decomp_dims, comm)
Decomposition of 3D data
Data dimensions: (16, 32, 64)
Decomposed dimensions: (1,)
Data permutation: NoPermutation()
julia> decomposition(pen)
(1,)
julia> topology(pen)
MPI topology: 1D decomposition (4 processes) # again, assuming 4 MPI processes
Low-level interface
Note that the above high-level constructors don't require the definition of a MPITopology
, which is constructed implicitly. For more control, one may want to manually construct a MPITopology
, and then construct a Pencil
from that. As above, one may also specify the list of decomposed dimensions.
For instance, we may want to decompose 32 MPI processes into a 8×4 Cartesian topology. This is done as follows:
julia> topo = MPITopology(comm, (8, 4)) # NOTE: fails if MPI.Comm_size(comm) ≠ 32
MPI topology: 2D decomposition (8×4 processes)
julia> dims_global = (16, 32, 64);
julia> pen = Pencil(topo, dims_global)
Decomposition of 3D data
Data dimensions: (16, 32, 64)
Decomposed dimensions: (2, 3)
Data permutation: NoPermutation()
As before, the decomposed dimensions are the rightmost ones by default (in this case, dimensions 2
and 3
). A different set of dimensions may be selected via an optional positional argument. For instance, to decompose along dimensions 1
and 3
instead:
julia> decomp_dims = (1, 3);
julia> pen = Pencil(topo, dims_global, decomp_dims)
Decomposition of 3D data
Data dimensions: (16, 32, 64)
Decomposed dimensions: (1, 3)
Data permutation: NoPermutation()
Defining multiple pencils
One may also want to work with multiple pencil configurations that differ, for instance, on the selection of decomposed dimensions. For this case, a constructor is available that takes an already existing Pencil
instance. Calling this constructor should be preferred when possible since it allows sharing memory buffers (used for instance for global transpositions) and thus reducing memory usage. The following creates a Pencil
equivalent to the one above, but with different decomposed dimensions:
julia> pen_y = Pencil(pen; decomp_dims = (1, 3))
Decomposition of 3D data
Data dimensions: (16, 32, 64)
Decomposed dimensions: (1, 3)
Data permutation: NoPermutation()
See the Pencil
documentation for more details.
Dimension permutations
A Pencil
may optionally be given information on dimension permutations. In this case, the layout of the data arrays in memory is different from the logical order of dimensions. For performance reasons, permutations are compile-time objects defined in the StaticPermutations package.
To make permutations clearer, consider the example above where the global data dimensions are $N_x × N_y × N_z = 16 × 32 × 64$. In this case, the logical order is $(x, y, z)$. Now let's say that we want the memory order of the data to be $(y, z, x)$,[2] which corresponds to the permutation (2, 3, 1)
.
Permutations are passed to the Pencil
constructor via the permute
keyword argument. Dimension permutations should be specified using a Permutation
object. For instance,
julia> perm = Permutation(2, 3, 1);
julia> pen = Pencil(dims_global, comm; permute = perm)
Decomposition of 3D data
Data dimensions: (16, 32, 64)
Decomposed dimensions: (2, 3)
Data permutation: Permutation(2, 3, 1)
One can also pass a NoPermutation
to disable permutations (this is the default).
Types
PencilArrays.Pencils.Pencil
— TypePencil{N,M}
Describes the decomposition of an N
-dimensional array among MPI processes along M
directions (with M < N
).
Pencil(
[A = Array],
topology::MPITopology{M}, size_global::Dims{N},
decomp_dims::Dims{M} = default_decomposition(N, Val(M));
permute::AbstractPermutation = NoPermutation(),
timer = TimerOutput(),
)
Define the decomposition of an N
-dimensional geometry along M
dimensions.
The dimensions of the geometry are given by size_global = (N1, N2, ...)
. The Pencil
describes the decomposition of an array of dimensions size_global
across a group of MPI processes.
Data is distributed over the given M
-dimensional MPI topology (with M ≤ N
).
The decomposed dimensions may optionally be provided via the decomp_dims
argument. By default, the M
rightmost dimensions are decomposed. For instance, for a 2D decomposition of 5D data (M = 2
and N = 5
), the dimensions (4, 5)
are decomposed by default.
It is also possible to distribute over all dimensions (M = N
). Note that, in this specific case, transpositions are currently not possible.
The optional argument A
allows to work with arrays other than the base Array
type. In particular, this should be useful for working with GPU array types such as CuArray
.
The optional permute
parameter may be used to indicate a permutation of the data indices from logical order (the order in which the arrays are accessed in code) to memory order (the actual order of indices in memory). Permutations must be specified using the exported Permutation
type, as in permute = Permutation(3, 1, 2)
.
It is also possible to pass a TimerOutput
to the constructor. See Measuring performance for details.
Examples
Decompose a 3D geometry of global dimensions $N_x × N_y × N_z = 4×8×12$ along the second ($y$) and third ($z$) dimensions:
julia> topo = MPITopology(MPI.COMM_WORLD, Val(2));
julia> Pencil(topo, (4, 8, 12), (2, 3))
Decomposition of 3D data
Data dimensions: (4, 8, 12)
Decomposed dimensions: (2, 3)
Data permutation: NoPermutation()
Array type: Array
julia> Pencil(topo, (4, 8, 12), (2, 3); permute = Permutation(3, 2, 1))
Decomposition of 3D data
Data dimensions: (4, 8, 12)
Decomposed dimensions: (2, 3)
Data permutation: Permutation(3, 2, 1)
Array type: Array
In the second case, the actual data is stored in (z, y, x)
order within each MPI process.
Pencil([A = Array], size_global::Dims{N}, [decomp_dims = (2, …, N)], comm::MPI.Comm; kws...)
Convenience constructor that implicitly creates a MPITopology
.
The number of decomposed dimensions specified by decomp_dims
must be M < N
. If decomp_dims
is not passed, dimensions 2:N
are decomposed.
Keyword arguments are passed to alternative constructor taking an MPITopology
. That constructor should be used if more control is desired.
Examples
julia> Pencil((4, 8, 12), MPI.COMM_WORLD)
Decomposition of 3D data
Data dimensions: (4, 8, 12)
Decomposed dimensions: (2, 3)
Data permutation: NoPermutation()
Array type: Array
julia> Pencil((4, 8, 12), (1, ), MPI.COMM_WORLD)
Decomposition of 3D data
Data dimensions: (4, 8, 12)
Decomposed dimensions: (1,)
Data permutation: NoPermutation()
Array type: Array
Pencil(
[A = Array],
p::Pencil{N,M};
decomp_dims::Dims{M} = decomposition(p),
size_global::Dims{N} = size_global(p),
permute::P = permutation(p),
timer::TimerOutput = timer(p),
)
Create new pencil configuration from an existent one.
This constructor enables sharing temporary data buffers between the two pencil configurations, leading to reduced global memory usage.
PencilArrays.Pencils.AbstractIndexOrder
— TypeAbstractIndexOrder
Abstract type determining the ordering of dimensions of an array with possibly permuted indices.
Subtypes are MemoryOrder
and LogicalOrder
.
PencilArrays.Pencils.MemoryOrder
— TypeMemoryOrder <: AbstractIndexOrder
Singleton type specifying that array dimensions should be given in memory (or permuted) order.
PencilArrays.Pencils.LogicalOrder
— TypeLogicalOrder <: AbstractIndexOrder
Singleton type specifying that array dimensions should be given in logical (or non-permuted) order.
Methods
PencilArrays.Pencils.topology
— Methodtopology(p::Pencil)
Get MPITopology
attached to Pencil
.
PencilArrays.Pencils.MPITopologies.get_comm
— Methodget_comm(p::Pencil)
Get MPI communicator associated to an MPI decomposition scheme.
PencilArrays.Pencils.decomposition
— Methoddecomposition(p::Pencil)
Get tuple with decomposed dimensions of the given pencil configuration.
PencilArrays.Permutations.permutation
— Methodpermutation(::Type{<:Pencil}) -> AbstractPermutation
permutation(p::Pencil) -> AbstractPermutation
Get index permutation associated to the given pencil configuration.
Returns NoPermutation()
if there is no associated permutation.
PencilArrays.Pencils.timer
— MethodBase.length
— Methodlength(p::Pencil)
Get linear length of the local data associated to the decomposition.
Equivalent to length_local(p)
.
Base.size
— Methodsize(p::Pencil)
Returns the local data dimensions associated to the decomposition, in logical order.
This is defined as size_local(p, LogicalOrder())
.
Base.ndims
— Methodndims(t::MPITopology)
Get dimensionality of Cartesian topology.
ndims(p::Pencil)
Number of spatial dimensions associated to pencil data.
This corresponds to the total number of dimensions of the space, which includes the decomposed and non-decomposed dimensions.
PencilArrays.Pencils.range_remote
— Methodrange_remote(p::Pencil, coords, [order = LogicalOrder()])
range_remote(p::Pencil, n::Integer, [order = LogicalOrder()])
Get data range held by a given MPI process.
In the first variant, coords
are the coordinates of the MPI process in the Cartesian topology. They can be specified as a tuple (i, j, ...)
or as a CartesianIndex
.
In the second variant, n
is the linear index of a given process in the topology.
PencilArrays.Pencils.range_local
— Methodrange_local(p::Pencil, [order = LogicalOrder()])
Local data range held by the pencil.
By default the dimensions are not permuted, i.e. they follow the logical order of dimensions.
PencilArrays.Pencils.size_global
— Methodsize_global(p::Pencil, [order = LogicalOrder()])
Global dimensions of the Cartesian grid associated to the given domain decomposition.
Like size_local
, by default the returned dimensions are in logical order.
PencilArrays.Pencils.size_local
— Methodsize_local(p::Pencil, [order = LogicalOrder()])
Local dimensions of the data held by the pencil.
By default the dimensions are not permuted, i.e. they follow the logical order of dimensions.
PencilArrays.Pencils.length_global
— Methodlength_global(p::Pencil)
Get linear length of the global data associated to the decomposition.
PencilArrays.Pencils.length_local
— Methodlength_local(p::Pencil)
Get linear length of the local data associated to the decomposition.
PencilArrays.Pencils.to_local
— Methodto_local(p::Pencil, global_inds, [order = LogicalOrder()])
Convert non-permuted (logical) global indices to local indices.
If order = MemoryOrder()
, returned indices will be permuted using the permutation associated to the pencil configuration p
.
Base.similar
— Methodsimilar(x::PencilArray, [element_type=eltype(x)], [dims]) -> PencilArray
Returns a PencilArray
similar to x
.
In particular, the new array shares the same parallel decomposition (the same Pencil
) than x
. This means that the dimensions of the new array must be the same as those of x
. Note that the optional dims
argument is allowed for the sole reason of making things work nicely with other packages (such as StructArrays.jl), but things will fail if dims ≠ size(x)
.
Examples
julia> pen = Pencil((20, 10, 12), MPI.COMM_WORLD);
julia> u = PencilArray{Float64}(undef, pen);
julia> similar(u) |> summary
"20×10×12 PencilArray{Float64, 3}(::Pencil{3, 2, NoPermutation, Array})"
julia> similar(u, size(u)) |> summary
"20×10×12 PencilArray{Float64, 3}(::Pencil{3, 2, NoPermutation, Array})"
julia> similar(u, ComplexF32) |> summary
"20×10×12 PencilArray{ComplexF32, 3}(::Pencil{3, 2, NoPermutation, Array})"
julia> similar(u, (4, 3, 8))
ERROR: DimensionMismatch: cannot construct a similar PencilArray with different size
julia> similar(u, (4, 3)) |> summary
ERROR: DimensionMismatch: cannot construct a similar PencilArray with different size
julia> similar(u, ComplexF32) |> summary
"20×10×12 PencilArray{ComplexF32, 3}(::Pencil{3, 2, NoPermutation, Array})"
julia> similar(u, ComplexF32, (4, 3))
ERROR: DimensionMismatch: cannot construct a similar PencilArray with different size
similar(x::PencilArray, [element_type = eltype(x)], p::Pencil)
Create a PencilArray
with the decomposition described by the given Pencil
.
This variant may be used to create a PencilArray
that has a different decomposition than the input PencilArray
.
Examples
julia> pen_u = Pencil((20, 10, 12), (2, 3), MPI.COMM_WORLD);
julia> u = PencilArray{Float64}(undef, pen_u);
julia> pen_v = Pencil(pen_u; decomp_dims = (1, 3), permute = Permutation(2, 3, 1))
Decomposition of 3D data
Data dimensions: (20, 10, 12)
Decomposed dimensions: (1, 3)
Data permutation: Permutation(2, 3, 1)
Array type: Array
julia> v = similar(u, pen_v);
julia> summary(v)
"20×10×12 PencilArray{Float64, 3}(::Pencil{3, 2, Permutation{(2, 3, 1), 3}, Array})"
julia> pencil(v) === pen_v
true
julia> vint = similar(u, Int, pen_v);
julia> summary(vint)
"20×10×12 PencilArray{Int64, 3}(::Pencil{3, 2, Permutation{(2, 3, 1), 3}, Array})"
julia> pencil(vint) === pen_v
true
similar(p::Pencil, [A = typeof_array(p)], [dims = size_global(p)])
Returns a Pencil
decomposition with global dimensions dims
and with underlying array type A
.
Typically, A
should be something like Array
or CuArray
(see Pencil
for details).
Index
PencilArrays.Pencils.AbstractIndexOrder
PencilArrays.Pencils.LogicalOrder
PencilArrays.Pencils.MemoryOrder
PencilArrays.Pencils.Pencil
Base.length
Base.ndims
Base.similar
Base.size
PencilArrays.Pencils.MPITopologies.get_comm
PencilArrays.Pencils.decomposition
PencilArrays.Pencils.length_global
PencilArrays.Pencils.length_local
PencilArrays.Pencils.range_local
PencilArrays.Pencils.range_remote
PencilArrays.Pencils.size_global
PencilArrays.Pencils.size_local
PencilArrays.Pencils.timer
PencilArrays.Pencils.to_local
PencilArrays.Pencils.topology
PencilArrays.Permutations.permutation
- 1More generally, an $N$-dimensional dataset is by default decomposed along its $N - 1$ last dimensions.
- 2Why would we want this? One application is to efficiently perform FFTs along $y$, which, under this permutation, would be the fastest dimension. This is used by the PencilFFTs package.