Parallel I/O
The PencilArrays.PencilIO
module contains functions for saving and loading PencilArray
s to disk using parallel I/O. Currently, two different output formats are supported:
- raw binary files via the MPI-IO interface;
- parallel HDF5 files.
In both cases, information on dataset sizes, names and other metadata are included along with the binary data.
The implemented approach consists in storing the data coming from different MPI processes in a single file. This strategy scales better in terms of number of files, and is more convenient, than that of storing one file per process. However, the performance is very sensitive to the configuration of the underlying file system. In distributed file systems such as Lustre, it is worth tuning parameters such as the stripe count and stripe size. For more information, see for instance the Parallel HDF5 page.
Getting started
The first step before writing PencilArray
s is to choose the parallel I/O driver, which determines the format of the output data. Two different drivers are currently available:
MPIIODriver
: parallel I/O via the MPI-IO API and the MPI.jl wrappers. This driver writes a raw binary file, along with a JSON file describing dataset metadata (name, dimensions, location in file, ...);PHDF5Driver
: parallel I/O via the Parallel HDF5 API and HDF5.jl. This driver requires a special set-up, as detailed in the dedicated section.
Writing data
To open a parallel file, pass the MPI communicator and an instance of the chosen driver to open
. For instance, the following opens an MPI-IO file in write mode:
using PencilArrays.PencilIO # needed for accessing parallel I/O functionality
ff = open(MPIIODriver(), "filename.bin", MPI.COMM_WORLD; write=true)
Datasets, in the form of PencilArray
s, can then be written as follows:
v = PencilArray(...)
ff["velocity"] = v
This writing step may be customised via keyword arguments such as chunks
and collective
. These options are supported by both MPI-IO and HDF5 drivers. For instance:
ff["velocity", chunks=true, collective=false] = v
See setindex!
for the meaning of these options for each driver, as well as for driver-specific options.
After datasets are written, the file should be closed as usual by doing close(ff)
. Note that the do-block syntax is also supported, as in
open(MPIIODriver(), "filename.bin", MPI.COMM_WORLD; write=true) do ff
ff["velocity"] = v
end
Reading data
Data is loaded into an existent PencilArray
using read!
. For instance:
v = PencilArray(...)
open(MPIIODriver(), "filename.bin", MPI.COMM_WORLD; read=true) do ff
read!(ff, v, "velocity")
end
Note that, for the MPI-IO driver, a filename.bin.json
file must be present along with the filename.bin
file containing the binary data. The JSON file is automatically generated when writing data with this driver.
Optional keyword arguments, such as collective
, are also supported by read!
.
Setting-up Parallel HDF5
If using the Parallel HDF5 driver, the HDF5.jl package must be available and configured with MPI support. Note that HDF5.jl versions previous to v0.15 are not supported.
Parallel HDF5 is not enabled in the default installation of HDF5.jl. For Parallel HDF5 to work, the HDF5 C libraries wrapped by HDF5.jl must be compiled with parallel support and linked to the specific MPI implementation that will be used for parallel I/O. HDF5.jl must be explicitly instructed to use parallel-enabled HDF5 libraries available in the system. Similarly, MPI.jl must be instructed to use the corresponding MPI libraries. This is detailed in the sections below.
Parallel-enabled HDF5 libraries are usually included in computing clusters and linked to the available MPI implementations. They are also available via the package manager of a number of Linux distributions. (For instance, Fedora includes the hdf5-mpich-devel
and hdf5-openmpi-devel
packages, respectively linked to the MPICH and OpenMPI libraries in the Fedora repositories.)
The following step-by-step guide assumes one already has access to parallel-enabled HDF5 libraries linked to an existent MPI installation.
1. Using system-provided MPI libraries
Select the system-provided MPI backend linked to the parallel HDF5 installation following the instructions in the MPI.jl docs.
2. Using parallel HDF5 libraries
Set the JULIA_HDF5_PATH
environment variable to the top-level installation directory of the HDF5 libraries compiled with parallel support are found. Then run ]build HDF5
from Julia. Note that the selected HDF5 library must be linked to the MPI library chosen in the previous section. Also note that HDF5 library versions older than 0.10.4 are not supported by HDF5.jl. For the set-up to be persistent across HDF5.jl updates, consider setting JULIA_HDF5_PATH
in ~/.bashrc
or similar.
See the HDF5.jl README for details.
3. Loading PencilIO
In the PencilIO
module, the HDF5.jl package is lazy-loaded using Requires. This means that HDF5 functionality will be available after both the PencilArrays.jl
and HDF5.jl
packages have been loaded:
using MPI
using HDF5
using PencilArrays
Library
PencilArrays.PencilIO.ParallelIODriver
— TypeParallelIODriver
Abstract type specifying a parallel I/O driver.
PencilArrays.PencilIO.MPIIODriver
— TypeMPIIODriver(; sequential = false, uniqueopen = false, deleteonclose = false)
MPI-IO driver using the MPI.jl package.
Keyword arguments are passed to MPI.File.open
.
This driver writes binary data along with a JSON file containing metadata. When reading data, this JSON file is expected to be present along with the raw data file.
PencilArrays.PencilIO.PHDF5Driver
— TypePHDF5Driver(; fcpl = HDF5.FileCreateProperties(), fapl = HDF5.FileAccessProperties())
Parallel HDF5 driver using the HDF5.jl package.
HDF5 file creation and file access property lists may be specified via the fcpl
and fapl
keyword arguments respectively.
Note that the MPIO file access property list does not need to be set, as this is done automatically by this driver when the file is opened.
PencilArrays.PencilIO.MPIFile
— TypeMPIFile
Wraps a MPI.FileHandle
, also including file position information and metadata.
File position is updated when reading and writing data, and is independent of the individual and shared file pointers defined by MPI.
Base.open
— Functionopen([f::Function], driver::ParallelIODriver, filename, comm::MPI.Comm; keywords...)
Open parallel file using the chosen driver.
Keyword arguments
Supported keyword arguments include:
open mode arguments:
read
,write
,create
,append
andtruncate
. These have the same behaviour and defaults asBase.open
. Some of them may be ignored by the chosen driver (see driver-specific docs).as in
MPI.File.open
, other arguments are passed via anMPI.Info
object.
Note that driver-specific options (such as HDF5 property lists) must be passed to each driver's constructor.
See also
open(::MPIIODriver)
for MPI-IO specific optionsopen(::PHDF5Driver)
for HDF5 specific options
open([f::Function], driver::MPIIODriver, filename, comm::MPI.Comm; keywords...)
Open parallel file using the MPI-IO driver.
See open(::ParallelIODriver)
for common options for all drivers.
Driver-specific options may be passed via the driver
argument. See MPIIODriver
for details.
Driver notes
- the
truncate
keyword is ignored.
open([f::Function], driver::PHDF5Driver, filename, comm::MPI.Comm; keywords...)
Open parallel file using the Parallel HDF5 driver.
See open(::ParallelIODriver)
for common options for all drivers.
Driver-specific options may be passed via the driver
argument. See PHDF5Driver
for details.
Base.setindex!
— Functionsetindex!(file::MPIFile, x, name; chunks = false, collective = true, infokws...)
Write PencilArray
to binary file using MPI-IO.
The input x
can be a PencilArray
or a tuple of PencilArray
s.
Optional arguments
if
chunks = true
, data is written in contiguous blocks, with one block per process. Otherwise, each process writes to discontiguous sections of disk, usingMPI.File.set_view!
and custom datatypes. Note that discontiguous I/O (the default) is more convenient, as it allows to read back the data using a different number or distribution of MPI processes.if
collective = true
, the dataset is written collectivelly. This is usually recommended for performance.when writing discontiguous blocks, additional keyword arguments are passed via an
MPI.Info
object toMPI.File.set_view!
. This is ignored ifchunks = true
.
setindex!(
g::Union{HDF5.File, HDF5.Group}, x::MaybePencilArrayCollection,
name::AbstractString; chunks = false, collective = true, prop_lists...,
)
Write PencilArray
or PencilArrayCollection
to parallel HDF5 file.
For performance reasons, the memory layout of the data is conserved. In other words, if the dimensions of a PencilArray
are permuted in memory, then the data is written in permuted form.
In the case of a PencilArrayCollection
, each array of the collection is written as a single component of a higher-dimension dataset.
Optional arguments
if
chunks = true
, data is written in chunks, with roughly one chunk per MPI process. This may (or may not) improve performance in parallel filesystems.if
collective = true
, the dataset is written collectivelly. This is usually recommended for performance.additional property lists may be specified by key-value pairs in
prop_lists
, following the HDF5.jl syntax. These property lists take precedence over keyword arguments. For instance, if thedxpl_mpio = :collective
option is passed, then the value of thecollective
argument is ignored.
Property lists
Property lists are passed to h5d_create
and h5d_write
. The following property types are recognised:
- link creation properties,
- dataset creation properties,
- dataset access properties,
- dataset transfer properties.
Example
Open a parallel HDF5 file and write some PencilArray
s to the file:
pencil = Pencil(#= ... =#)
u = PencilArray{Float64}(undef, pencil)
v = similar(u)
# [fill the arrays with interesting values...]
comm = get_comm(u)
open(PHDF5Driver(), "filename.h5", comm, write=true) do ff
ff["u", chunks=true] = u
ff["uv"] = (u, v) # this is a two-component PencilArrayCollection (assuming equal dimensions of `u` and `v`)
end
Base.read!
— Functionread!(file::MPIFile, x, name; collective = true, infokws...)
Read binary data from an MPI-IO stream, filling in PencilArray
.
The output x
can be a PencilArray
or a tuple of PencilArray
s.
See setindex!
for details on keyword arguments.
Reading files without JSON metadata
It is also possible to read datasets from binary files in the absence of JSON metadata. This will be typically the case of binary files created by a separate application.
In that case, the name
argument must not be passed. If the file contains more than one dataset, one can optionally pass an offset
keyword argument to manually select the offset of the dataset (in bytes) from the beginning of the file.
The signature of this metadata-less variant looks like:
read!(file::MPIFile, x; offset = 0, collective = true, infokws...)
Note that, since there is no metadata, this variant blindly assumes that the dimensions and element type of x
correspond to those existent in the file.
read!(g::Union{HDF5.File, HDF5.Group}, x::MaybePencilArrayCollection,
name::AbstractString; collective=true, prop_lists...)
Read PencilArray
or PencilArrayCollection
from parallel HDF5 file.
See setindex!
for details on optional arguments.
Property lists
Property lists are passed to h5d_open
and h5d_read
. The following property types are recognised:
Example
Open a parallel HDF5 file and read some PencilArray
s:
pencil = Pencil(#= ... =#)
u = PencilArray{Float64}(undef, pencil)
v = similar(u)
comm = get_comm(u)
info = MPI.Info()
open(PHDF5Driver(), "filename.h5", comm, read=true) do ff
read!(ff, u, "u")
read!(ff, (u, v), "uv")
end
PencilArrays.PencilIO.hdf5_has_parallel
— Functionhdf5_has_parallel() -> Bool
Returns true
if the loaded HDF5 libraries support MPI-IO.
This is exactly the same as HDF5.has_parallel()
, and is left here for compatibility with previous versions.