In-place transforms
Complex-to-complex and real-to-real transforms can be performed in-place, enabling important memory savings. The procedure is very similar to that of out-of-place transforms described in the tutorial. The differences are illustrated in the sections below.
Creating a domain partition
We start by partitioning a domain of dimensions $16×32×64$ along all available MPI processes.
using PencilFFTs
using MPI
MPI.Init()
dims_global = (16, 32, 64) # global dimensions
(16, 32, 64)
Such a partitioning is described by a Pencil
object. Here we choose to decompose the domain along the last two dimensions. In this case, the actual number of processes along each of these dimensions is chosen automatically.
decomp_dims = (2, 3)
comm = MPI.COMM_WORLD
pen = Pencil(dims_global, decomp_dims, comm)
Decomposition of 3D data
Data dimensions: (16, 32, 64)
Decomposed dimensions: (2, 3)
Data permutation: NoPermutation()
Array type: Array
Distributed transforms using PencilFFTs.jl require that the first dimension is not decomposed. In other words, if one wants to perform transforms, then decomp_dims
above must not contain 1
.
Creating in-place plans
To create an in-place plan, pass an in-place transform such as Transforms.FFT!
or Transforms.R2R!
to PencilFFTPlan
. For instance:
# Perform a 3D in-place complex-to-complex FFT.
transform = Transforms.FFT!()
# Note that one can also combine different types of in-place transforms.
# For instance:
# transform = (
# Transforms.R2R!(FFTW.REDFT01),
# Transforms.FFT!(),
# Transforms.R2R!(FFTW.DHT),
# )
FFT!
We can now create a distributed plan from the previously-created domain partition and the chosen transform.
plan = PencilFFTPlan(pen, transform)
Transforms: (FFT!, FFT!, FFT!)
Input type: ComplexF64
Global dimensions: (16, 32, 64) -> (16, 32, 64)
MPI topology: 2D decomposition (2×1 processes)
Note that in-place real-to-complex transforms are not currently supported. (In other words, the RFFT!
transform type is not defined.)
Allocating data
As with out-of-place plans, data should be allocated using allocate_input
. The difference is that, for in-place plans, this function returns a ManyPencilArray
object, which is a container holding multiple PencilArray
views sharing the same memory space.
# Allocate data for the plan.
# Since `plan` is in-place, this returns a `ManyPencilArray` container.
A = allocate_input(plan)
summary(A)
"ManyPencilArray{ComplexF64, 3, 3, Tuple{PencilArray{ComplexF64, 3, Base.ReshapedArray{ComplexF64, 3, SubArray{ComplexF64, 1, Vector{ComplexF64}, Tuple{Base.OneTo{Int64}}, true}, Tuple{}}, 3, 0, Pencil{3, 2, NoPermutation, Vector{UInt8}}}, PencilArray{ComplexF64, 3, Base." ⋯ 140 bytes ⋯ "tation{(2, 1, 3), 3}, Vector{UInt8}}}, PencilArray{ComplexF64, 3, Base.ReshapedArray{ComplexF64, 3, SubArray{ComplexF64, 1, Vector{ComplexF64}, Tuple{Base.OneTo{Int64}}, true}, Tuple{}}, 3, 0, Pencil{3, 2, Permutation{(3, 2, 1), 3}, Vector{UInt8}}}}, Vector{ComplexF64}}"
Note that allocate_output
also works for in-place plans. In this case, it returns exactly the same thing as allocate_input
.
As shown in the next section, in-place plans must be applied on the returned ManyPencilArray
. On the other hand, one usually wants to access and modify data, and for this one needs the PencilArray
views contained in the ManyPencilArray
. The input and output array views can be obtained by calling first(::ManyPencilArray)
and last(::ManyPencilArray)
.
For instance, we can initialise the input array with some data before transforming:
using Random
u_in = first(A) # input data view
randn!(u_in)
summary(u_in)
"16×16×64 PencilArray{ComplexF64, 3}(::Pencil{3, 2, NoPermutation, Array})"
Applying plans
Like in FFTW.jl
, one can perform in-place transforms using the *
and \
operators. As mentioned above, in-place plans must be applied on the ManyPencilArray
containers returned by allocate_input
.
plan * A; # performs in-place forward transform
After performing an in-place transform, data contained in u_in
has been overwritten and has no "physical" meaning. In other words, u_in
should not be used at this point. To access the transformed data, one should retrieve the output data view using last(A)
.
For instance, to compute the global sum of the transformed data:
u_out = last(A) # output data view
sum(u_out) # sum of transformed data (note that `sum` reduces over all processes)
-406.14918822270556 - 34413.04836644857im
Finally, we can perform a backward transform and do stuff with the input view:
plan \ A; # perform in-place backward transform
At this point, the data can be once again found in the input view u_in
, while u_out
should not be accessed.
This page was generated using Literate.jl.