Measuring performance

It is possible to measure the time spent in different sections of the distributed transforms using the TimerOutputs package. This has a (very small) performance overhead, so it is disabled by default. To enable time measurements, call TimerOutputs.enable_debug_timings after loading PencilFFTs (see below for an example). For more details see the TimerOutputs docs.

Minimal example:

using MPI
using PencilFFTs
using TimerOutputs

# Enable timing of `PencilFFTs` functions
TimerOutputs.enable_debug_timings(PencilFFTs)
TimerOutputs.enable_debug_timings(PencilArrays)
TimerOutputs.enable_debug_timings(Transpositions)

MPI.Init()

plan = PencilFFTPlan(#= args... =#)

# [do stuff with `plan`...]

# Retrieve and print timing data associated to `plan`
to = timer(plan)
print_timer(to)

By default, each PencilFFTPlan has its own TimerOutput. If you already have a TimerOutput, you can pass it to the PencilFFTPlan constructor:

to = TimerOutput()
plan = PencilFFTPlan(..., timer=to)

# [do stuff with `plan`...]

print_timer(to)