Measuring performance

It is possible to measure the time spent in different sections of the MPI data transposition routines using the TimerOutputs package. This has a (very small) performance overhead, so it is disabled by default. To enable time measurements, call TimerOutputs.enable_debug_timings after loading PencilArrays (see below for an example). For more details see the TimerOutputs docs.

Minimal example:

using MPI
using PencilArrays
using TimerOutputs

# Enable timing of `PencilArrays` functions
TimerOutputs.enable_debug_timings(PencilArrays)
TimerOutputs.enable_debug_timings(Transpositions)

MPI.Init()

pencil = Pencil(#= args... =#)

# [do stuff with `pencil`...]

# Retrieve and print timing data associated to `plan`
to = timer(pencil)
print_timer(to)

By default, each Pencil has its own TimerOutput. If you already have a TimerOutput, you can pass it to the Pencil constructor:

to = TimerOutput()
pencil = Pencil(..., timer=to)

# [do stuff with `pencil`...]

print_timer(to)