Parallel I/O

The PencilArrays.PencilIO module contains functions for saving and loading PencilArrays to disk using parallel I/O. Currently, two different output formats are supported:

  • raw binary files via the MPI-IO interface;
  • parallel HDF5 files.

In both cases, information on dataset sizes, names and other metadata are included along with the binary data.

The implemented approach consists in storing the data coming from different MPI processes in a single file. This strategy scales better in terms of number of files, and is more convenient, than that of storing one file per process. However, the performance is very sensitive to the configuration of the underlying file system. In distributed file systems such as Lustre, it is worth tuning parameters such as the stripe count and stripe size. For more information, see for instance the Parallel HDF5 page.

Getting started

The first step before writing PencilArrays is to choose the parallel I/O driver, which determines the format of the output data. Two different drivers are currently available:

  • MPIIODriver: parallel I/O via the MPI-IO API and the MPI.jl wrappers. This driver writes a raw binary file, along with a JSON file describing dataset metadata (name, dimensions, location in file, ...);

  • PHDF5Driver: parallel I/O via the Parallel HDF5 API and HDF5.jl. This driver requires a special set-up, as detailed in the dedicated section.

Writing data

To open a parallel file, pass the MPI communicator and an instance of the chosen driver to open. For instance, the following opens an MPI-IO file in write mode:

using PencilArrays.PencilIO  # needed for accessing parallel I/O functionality

ff = open(MPIIODriver(), "filename.bin", MPI.COMM_WORLD; write=true)

Datasets, in the form of PencilArrays, can then be written as follows:

v = PencilArray(...)
ff["velocity"] = v

This writing step may be customised via keyword arguments such as chunks and collective. These options are supported by both MPI-IO and HDF5 drivers. For instance:

ff["velocity", chunks=true, collective=false] = v

See setindex! for the meaning of these options for each driver, as well as for driver-specific options.

After datasets are written, the file should be closed as usual by doing close(ff). Note that the do-block syntax is also supported, as in

open(MPIIODriver(), "filename.bin", MPI.COMM_WORLD; write=true) do ff
    ff["velocity"] = v
end

Reading data

Data is loaded into an existent PencilArray using read!. For instance:

v = PencilArray(...)
open(MPIIODriver(), "filename.bin", MPI.COMM_WORLD; read=true) do ff
    read!(ff, v, "velocity")
end

Note that, for the MPI-IO driver, a filename.bin.json file must be present along with the filename.bin file containing the binary data. The JSON file is automatically generated when writing data with this driver.

Optional keyword arguments, such as collective, are also supported by read!.

Setting-up Parallel HDF5

If using the Parallel HDF5 driver, the HDF5.jl package must be available and configured with MPI support. Note that HDF5.jl versions previous to v0.15 are not supported.

Parallel HDF5 is not enabled in the default installation of HDF5.jl. For Parallel HDF5 to work, the HDF5 C libraries wrapped by HDF5.jl must be compiled with parallel support and linked to the specific MPI implementation that will be used for parallel I/O. HDF5.jl must be explicitly instructed to use parallel-enabled HDF5 libraries available in the system. Similarly, MPI.jl must be instructed to use the corresponding MPI libraries. This is detailed in the sections below.

Parallel-enabled HDF5 libraries are usually included in computing clusters and linked to the available MPI implementations. They are also available via the package manager of a number of Linux distributions. (For instance, Fedora includes the hdf5-mpich-devel and hdf5-openmpi-devel packages, respectively linked to the MPICH and OpenMPI libraries in the Fedora repositories.)

The following step-by-step guide assumes one already has access to parallel-enabled HDF5 libraries linked to an existent MPI installation.

1. Using system-provided MPI libraries

Select the system-provided MPI backend linked to the parallel HDF5 installation following the instructions in the MPI.jl docs.

2. Using parallel HDF5 libraries

Set the JULIA_HDF5_PATH environment variable to the top-level installation directory of the HDF5 libraries compiled with parallel support are found. Then run ]build HDF5 from Julia. Note that the selected HDF5 library must be linked to the MPI library chosen in the previous section. Also note that HDF5 library versions older than 0.10.4 are not supported by HDF5.jl. For the set-up to be persistent across HDF5.jl updates, consider setting JULIA_HDF5_PATH in ~/.bashrc or similar.

See the HDF5.jl README for details.

3. Loading PencilIO

In the PencilIO module, the HDF5.jl package is lazy-loaded using Requires. This means that HDF5 functionality will be available after both the PencilArrays.jl and HDF5.jl packages have been loaded:

using MPI
using HDF5
using PencilArrays

Library

PencilArrays.PencilIO.MPIIODriverType
MPIIODriver(; sequential = false, uniqueopen = false, deleteonclose = false)

MPI-IO driver using the MPI.jl package.

Keyword arguments are passed to MPI.File.open.

This driver writes binary data along with a JSON file containing metadata. When reading data, this JSON file is expected to be present along with the raw data file.

source
PencilArrays.PencilIO.PHDF5DriverType
PHDF5Driver(; fcpl = HDF5.FileCreateProperties(), fapl = HDF5.FileAccessProperties())

Parallel HDF5 driver using the HDF5.jl package.

HDF5 file creation and file access property lists may be specified via the fcpl and fapl keyword arguments respectively.

Note that the MPIO file access property list does not need to be set, as this is done automatically by this driver when the file is opened.

source
PencilArrays.PencilIO.MPIFileType
MPIFile

Wraps a MPI.FileHandle, also including file position information and metadata.

File position is updated when reading and writing data, and is independent of the individual and shared file pointers defined by MPI.

source
Base.openFunction
open([f::Function], driver::ParallelIODriver, filename, comm::MPI.Comm; keywords...)

Open parallel file using the chosen driver.

Keyword arguments

Supported keyword arguments include:

  • open mode arguments: read, write, create, append and truncate. These have the same behaviour and defaults as Base.open. Some of them may be ignored by the chosen driver (see driver-specific docs).

  • as in MPI.File.open, other arguments are passed via an MPI.Info object.

Note that driver-specific options (such as HDF5 property lists) must be passed to each driver's constructor.

See also

source
open([f::Function], driver::MPIIODriver, filename, comm::MPI.Comm; keywords...)

Open parallel file using the MPI-IO driver.

See open(::ParallelIODriver) for common options for all drivers.

Driver-specific options may be passed via the driver argument. See MPIIODriver for details.

Driver notes

  • the truncate keyword is ignored.
source
open([f::Function], driver::PHDF5Driver, filename, comm::MPI.Comm; keywords...)

Open parallel file using the Parallel HDF5 driver.

See open(::ParallelIODriver) for common options for all drivers.

Driver-specific options may be passed via the driver argument. See PHDF5Driver for details.

source
Base.setindex!Function
setindex!(file::MPIFile, x, name; chunks = false, collective = true, infokws...)

Write PencilArray to binary file using MPI-IO.

The input x can be a PencilArray or a tuple of PencilArrays.

Optional arguments

  • if chunks = true, data is written in contiguous blocks, with one block per process. Otherwise, each process writes to discontiguous sections of disk, using MPI.File.set_view! and custom datatypes. Note that discontiguous I/O (the default) is more convenient, as it allows to read back the data using a different number or distribution of MPI processes.

  • if collective = true, the dataset is written collectivelly. This is usually recommended for performance.

  • when writing discontiguous blocks, additional keyword arguments are passed via an MPI.Info object to MPI.File.set_view!. This is ignored if chunks = true.

source
setindex!(
    g::Union{HDF5.File, HDF5.Group}, x::MaybePencilArrayCollection,
    name::AbstractString; chunks = false, collective = true, prop_lists...,
)

Write PencilArray or PencilArrayCollection to parallel HDF5 file.

For performance reasons, the memory layout of the data is conserved. In other words, if the dimensions of a PencilArray are permuted in memory, then the data is written in permuted form.

In the case of a PencilArrayCollection, each array of the collection is written as a single component of a higher-dimension dataset.

Optional arguments

  • if chunks = true, data is written in chunks, with roughly one chunk per MPI process. This may (or may not) improve performance in parallel filesystems.

  • if collective = true, the dataset is written collectivelly. This is usually recommended for performance.

  • additional property lists may be specified by key-value pairs in prop_lists, following the HDF5.jl syntax. These property lists take precedence over keyword arguments. For instance, if the dxpl_mpio = :collective option is passed, then the value of the collective argument is ignored.

Property lists

Property lists are passed to h5d_create and h5d_write. The following property types are recognised:

Example

Open a parallel HDF5 file and write some PencilArrays to the file:

pencil = Pencil(#= ... =#)
u = PencilArray{Float64}(undef, pencil)
v = similar(u)

# [fill the arrays with interesting values...]

comm = get_comm(u)

open(PHDF5Driver(), "filename.h5", comm, write=true) do ff
    ff["u", chunks=true] = u
    ff["uv"] = (u, v)  # this is a two-component PencilArrayCollection (assuming equal dimensions of `u` and `v`)
end
source
Base.read!Function
read!(file::MPIFile, x, name; collective = true, infokws...)

Read binary data from an MPI-IO stream, filling in PencilArray.

The output x can be a PencilArray or a tuple of PencilArrays.

See setindex! for details on keyword arguments.

Reading files without JSON metadata

It is also possible to read datasets from binary files in the absence of JSON metadata. This will be typically the case of binary files created by a separate application.

In that case, the name argument must not be passed. If the file contains more than one dataset, one can optionally pass an offset keyword argument to manually select the offset of the dataset (in bytes) from the beginning of the file.

The signature of this metadata-less variant looks like:

read!(file::MPIFile, x; offset = 0, collective = true, infokws...)

Note that, since there is no metadata, this variant blindly assumes that the dimensions and element type of x correspond to those existent in the file.

source
read!(g::Union{HDF5.File, HDF5.Group}, x::MaybePencilArrayCollection,
      name::AbstractString; collective=true, prop_lists...)

Read PencilArray or PencilArrayCollection from parallel HDF5 file.

See setindex! for details on optional arguments.

Property lists

Property lists are passed to h5d_open and h5d_read. The following property types are recognised:

Example

Open a parallel HDF5 file and read some PencilArrays:

pencil = Pencil(#= ... =#)
u = PencilArray{Float64}(undef, pencil)
v = similar(u)

comm = get_comm(u)
info = MPI.Info()

open(PHDF5Driver(), "filename.h5", comm, read=true) do ff
    read!(ff, u, "u")
    read!(ff, (u, v), "uv")
end
source
PencilArrays.PencilIO.hdf5_has_parallelFunction
hdf5_has_parallel() -> Bool

Returns true if the loaded HDF5 libraries support MPI-IO.

This is exactly the same as HDF5.has_parallel(), and is left here for compatibility with previous versions.

source

Index