API

Creating plans

NonuniformFFTs.PlanNUFFTType
PlanNUFFT([T = ComplexF64], dims::Dims; ntransforms = Val(1), backend = CPU(), kwargs...)

Construct a plan for performing non-uniform FFTs (NUFFTs).

The created plan contains all data needed to perform NUFFTs for non-uniform data of type T (ComplexF64 by default) and uniform data with dimensions dims.

Optional keyword arguments

  • ntransforms = Val(1): the number of simultaneous transforms to perform. This is useful if one wants to transform multiple scalar quantities at the same non-uniform points.

  • backend::KernelAbstractions.Backend = CPU(): corresponds to the device type where everything will be executed. This could be e.g. CUDABackend() if CUDA.jl is loaded.

NUFFT parameters

The following parameters control transform accuracy. The default values give a relative accuracy of the order of $10^{-7}$ for Float64 or ComplexF64 data.

  • m = HalfSupport(4): the half-support of the convolution kernels. Large values increase accuracy at the cost of performance.

  • σ = 2.0: NUFFT oversampling factor. Typical values are 2.0 (more accurate) and 1.25 (faster), but other values such as 1.5 should also work.

  • kernel::AbstractKernel = BackwardsKaiserBesselKernel(): convolution kernel used for NUFFTs.

Main performance parameters

  • kernel_evalmode: method used for kernel evaluation. The default is FastApproximation on CPU, which will attempt to use a fast approximation method which greatly speeds up kernel evaluation. On GPUs the default is Direct, as the fast approximation method is not necessarily faster.

  • block_size: the block size (in number of elements) when using block partitioning or when sorting is enabled. This enables spatial sorting of points, even when sort_points = False() (which actually permutes point data for possibly faster memory accesses). The block size can be tuned for maximal performance. It can either be passed as an Int (linear block size) or as a tuple (B₁, …, Bₙ) to specify the block size in each Cartesian direction. The current default is 4096 on the CPU and around 1024 on the GPU (but depends on the number of dimensions). These may change in the future or even depend on the actual computing device. On the CPU, using block partitioning is required for running with multiple threads. On the GPU, this option is ignored if gpu_method = :shared_memory. Blocking / spatial sorting can be completely disabled by passing block_size = nothing (but this is generally slower).

  • gpu_method: allows to select between different implementations of GPU transforms. Possible options are:

    • :global_memory (default): directly read and write onto arrays in global memory in spreading (type-1) and interpolation (type-2) operations;

    • :shared_memory: copy data between global memory and shared memory (local to each GPU workgroup) and perform most operations in the latter, which is faster and can help avoid some atomic operations in type-1 transforms. We try to use as much shared memory as is typically available on current GPUs (which is typically 48 KiB on CUDA and 64 KiB on AMDGPU). Note that this method completely ignores the block_size parameter, as the actual block size is adjusted to maximise shared memory usage. When this method is enabled, one can play with the gpu_batch_size parameter (see below) to further tune performance.

    For highly dense problems (number of non-uniform points comparable to the total grid size), the :shared_memory method can be much faster, especially when the HalfSupport is 4 or less (accuracies up to 1e-7 for σ = 2).

  • fftw_flags = FFTW.MEASURE: parameters passed to the FFTW planner when backend = CPU().

Other performance parameters

These are more advanced performance parameters which may disappear or whose behaviour may change in the future.

  • sort_points = False(): whether to internally permute the order of the non-uniform points. This can be enabled by passing sort_points = True(). Ignored when block_size = nothing (which disables spatial sorting). In this case, more time will be spent in set_points! and less time on the actual transforms. This can improve performance if executing multiple transforms on the same non-uniform points. Note that, even when enabled, this does not modify the points argument passed to set_points!.

  • gpu_batch_size = Val(Np): minimum batch size used in type-1 transforms when gpu_method = :shared_memory. The idea is that, to avoid inefficient atomic operations on shared-memory arrays, we process non-uniform points in batches of Np points. The actual value of Np will typically be larger than the input one in order to maximise shared memory usage within each GPU workgroup. Note that larger Np also means that less shared memory space is available for local blocks, meaning that the effective block size can get smaller (which is not necessarily bad for performance, and can actually be beneficial). When tuning performance, it is helpful to print the plan (as in println(plan)) to see the actual block and batch sizes.

Other parameters

  • fftshift = false: determines the order of wavenumbers in uniform space. If false (default), the same order used by FFTW is used, with positive wavenumbers first ([0, 1, 2, …, N÷2-1] for even-size transforms) and negative ones afterwards ([-N÷2, …, -1]). Otherwise, wavenumbers are expected to be in increasing order ([-N÷2, -kmax, …, -1, 0, 1, …, N÷2-1]), which is compatible with the output in NFFT.jl and corresponds to applying the AbstractFFTs.fftshift function to the data. This option also corresponds to the modeord parameter in FINUFFT. This only affects complex-to-complex transforms.

  • timer = TimerOutput(): allows to specify a TimerOutput (from the TimerOutputs.jl package) where timing information will be written to. By default the plan creates its own timer. One can visualise the time spent on different parts of the NUFFT computation using p.timer.

  • synchronise = false: if true, add synchronisation barrier between calls to GPU kernels. Enabling this is needed for accurate timings in p.timer when computing on a GPU, but may result in reduced performance.

FFT size and performance

For performance reasons, when doing FFTs one usually wants the size of the input along each dimension to be a power of 2 (ideally), or the product of powers of small prime numbers (2, 3, 5, …). The problem is that, with the NUFFT, one does not directly control the FFT size due to the oversampling factor $σ$, which may be any real number $σ > 1$. That is, for an input of size $N$, FFTs are performed on an oversampled grid of size $Ñ ≈ σN$. Note that $σN$ is generally not an integer, hence the $≈$.

The aim of this section is to clarify how $Ñ$ is actually chosen, so that one can predict its value for given inputs $N$ and $σ$. This may be better understood in actual code:

Ñ = nextprod((2, 3, 5), floor(Int, σ * N))

Basically, we truncate $σN$ to an integer, and then we choose $Ñ$ as the next integer that can be written as the product of powers of 2, 3 and 5 (see nextprod). Most often, the result will be greater than or equal to $σN$.

Using real non-uniform data

In some applications, the non-uniform data to be transformed is purely real. In this case, one may pass Float64 or Float32 as the first argument. This may be faster than converting data to complex types, in particular because the real-to-complex FFTs from FFTW will be used to compute the transforms. Note that, in this case, the dimensions of the uniform data arrays is not exactly dims, since the size of the first dimension is divided roughly by 2 (taking advantage of Hermitian symmetry). For convenience, one can call size(::PlanNUFFT) on the constructed plan to know in advance the dimensions of the uniform data arrays.

source
NonuniformFFTs.NFFTPlanType
NonuniformFFTs.NFFTPlan(xp::AbstractMatrix{T}, dims::Dims{D}; kwargs...) -> plan

Create a NUFFT plan which is compatible with the AbstractNFFTs.jl interface.

The plan follows different rules from plans created by PlanNUFFT. In particular:

  • points are assumed to be in $[-1/2, 1/2)$ instead of $[0, 2π)$;
  • the opposite Fourier sign convention is used (e.g. $e^{-i k x_j}$ becomes $e^{+2π i k x_j}$);
  • uniform data is in increasing order by default, with frequencies $k = -N/2, …, -1, 0, 1, …, N/2-1$, as opposed to preserving the order used by FFTW (which starts at $k = 0$);
  • it only supports complex non-uniform data.

This constructor requires passing the non-uniform locations xp as the first argument. These should be given as a matrix of dimensions (D, Np), where D is the spatial dimension and Np the number of non-uniform points.

The second argument is simply the size (N₁, N₂, …) of the uniform data arrays.

Most keyword arguments from PlanNUFFT are also accepted here. Moreover, for compatibility reasons, most keyword arguments from the NFFT.jl package are also accepted as detailed below.

This type of plan can also be created via the AbstractNFFTs.plan_nfft function.

This constructor creates a plan which assumes complex-valued non-uniform data. For real-valued data, the PlanNUFFT constructor should be used instead.

Compatibility with NFFT.jl

Most of the parameters supported by the NFFT.jl package are also supported by this constructor. The currently supported parameters are reltol, m, σ, window, blocking, sortNodes and fftflags.

Moreover, unlike PlanNUFFT, this constructor sets fftshift = true by default (but can be overridden) so that the uniform data ordering is the same as in NFFT.jl.

Type instability

Explicitly passing some of these parameters may result in type-unstable code, since the exact type of the returned plan cannot be inferred. This is because, in NonuniformFFTs.jl, parameters such as the kernel size (m) or the convolution window (window) are included in the plan type (they are compile-time constants).

GPU usage

To create a GPU-compatible plan, simply pass the locations xp as a GPU array (e.g. a CuArray in CUDA). Unlike PlanNUFFT, the backend argument is not needed here and will be simply ignored.

source

Setting non-uniform points

NonuniformFFTs.set_points!Function
set_points!(p::PlanNUFFT, points)

Set non-uniform points before executing a NUFFT.

In one dimension, points is simply a vector of real values (the non-uniform locations).

In multiple dimensions, points may be passed as:

  • a tuple of vectors (xs::AbstractVector, ys::AbstractVector, …);
  • a vector [x⃗₁, x⃗₂, x⃗₃, x⃗₄, …] of tuples or static vectors (typically SVectors from the StaticArrays.jl package);
  • a matrix of size (d, Np) where d is the spatial dimension and Np the number of non-uniform points.

The points are allowed to be outside of the periodic cell $[0, 2π)^d$, in which case they will be "folded" to that domain.

source

Executing plans

NonuniformFFTs.exec_type1!Function
exec_type1!(ûs::AbstractArray{<:Complex}, p::PlanNUFFT, vp::AbstractVector{<:Number})
exec_type1!(ûs::NTuple{N, AbstractArray{<:Complex}}, p::PlanNUFFT, vp::NTuple{N, AbstractVector{<:Number}})

Perform type-1 NUFFT (from non-uniform points to uniform grid).

Here vp contains the input values at non-uniform points. The result of the transform is written into ûs.

One first needs to set the non-uniform points using set_points!.

To perform multiple transforms at once, both vp and ûs should be a tuple of arrays (second variant above). Note that this requires a plan initialised with ntransforms = Val(N) (see PlanNUFFT).

See also exec_type2!.

source
NonuniformFFTs.exec_type2!Function
exec_type2!(vp::AbstractVector{<:Number}, p::PlanNUFFT, ûs::AbstractArray{<:Complex})
exec_type2!(vp::NTuple{N, AbstractVector{<:Number}}, p::PlanNUFFT, ûs::NTuple{N, AbstractArray{<:Complex}})

Perform type-2 NUFFT (from uniform grid to non-uniform points).

Here ûs contains the input coefficients in the uniform grid. The result of the transform at non-uniform points is written into vp.

One first needs to set the non-uniform points using set_points!.

To perform multiple transforms at once, both vp and ûs should be a tuple of arrays (second variant above). Note that this requires a plan initialised with ntransforms = Val(N) (see PlanNUFFT).

See also exec_type1!.

source

Other functions

Base.sizeMethod
size(p::PlanNUFFT) -> (N₁, N₂, ...)

Return the dimensions of arrays containing uniform values.

This corresponds to the number of Fourier modes in each direction (in the non-oversampled grid).

source

Available spreading kernels

NonuniformFFTs.Kernels.KaiserBesselKernelType
KaiserBesselKernel <: AbstractKernel
KaiserBesselKernel([β])

Represents a Kaiser–Bessel spreading kernel.

Definition

\[ϕ(x) = I₀ \left(β \sqrt{1 - x²} \right) \quad \text{ for } |x| ≤ 1\]

where $I₀$ is the zeroth-order modified Bessel function of the first kind and $β$ is a shape factor.

Fourier transform

\[ϕ̂(k) = 2 \frac{\sinh\left( \sqrt{β² - k²} \right)}{\sqrt{β² - k²}}\]

Parameter selection

By default, the shape parameter is chosen to be [1]

\[β = γ M π \left( 2 - \frac{1}{σ} \right)\]

where $M$ is the kernel half-width and $σ$ the oversampling factor. Moreover, $γ = 0.980$ is an empirical "safety factor", similarly to the one used by FINUFFT [2], which slightly improves accuracy.

This default value can be overriden by explicitly passing a $β$ value.

Implementation details

Since the evaluation of Bessel functions can be costly, this kernel is efficiently evaluated via an accurate piecewise polynomial approximation. We use the same method originally proposed for FINUFFT [2] and later discussed by Shamshirgar et al. [3].

[1] Potts & Steidl, SIAM J. Sci. Comput. 24, 2013 (2003)
[2] Barnett, Magland & af Klinteberg, SIAM J. Sci. Comput. 41, C479 (2019)
[3] Shamshirgar, Bagge & Tornberg, J. Chem. Phys. 154, 164109 (2021)

source
NonuniformFFTs.Kernels.BackwardsKaiserBesselKernelType
BackwardsKaiserBesselKernel <: AbstractKernel
BackwardsKaiserBesselKernel([β])

Represents a backwards Kaiser–Bessel (KB) spreading kernel.

This kernel basically results from swapping the KaiserBesselKernel spreading function and its Fourier transform. It has very similar properties to the Kaiser–Bessel kernel.

Definition

\[ϕ(x) = \frac{\sinh \left(β \sqrt{1 - x²} \right)}{π \sqrt{1 - x²}} \quad \text{ for } |x| ≤ 1\]

where $β$ is a shape factor.

Fourier transform

\[ϕ̂(k) = I₀ \left( \sqrt{β² - k²} \right)\]

where $I₀$ is the zeroth-order modified Bessel function of the first kind.

Parameter selection

By default, the shape parameter is chosen to be [1]

\[β = γ M π \left( 2 - \frac{1}{σ} \right)\]

where $M$ is the kernel half-width and $σ$ the oversampling factor. Moreover, $γ = 0.995$ is an empirical "safety factor", similarly to the one used by FINUFFT [2], which slightly improves accuracy.

This default value can be overriden by explicitly passing a $β$ value.

Implementation details

Since the evaluation of the hyperbolic sine functions can be costly, this kernel is efficiently evaluated via an accurate piecewise polynomial approximation. We use the same method originally proposed for FINUFFT [2] and later discussed by Shamshirgar et al. [3].

[1] Potts & Steidl, SIAM J. Sci. Comput. 24, 2013 (2003)
[2] Barnett, Magland & af Klinteberg, SIAM J. Sci. Comput. 41, C479 (2019)
[3] Shamshirgar, Bagge & Tornberg, J. Chem. Phys. 154, 164109 (2021)

source
NonuniformFFTs.Kernels.GaussianKernelType
GaussianKernel <: AbstractKernel
GaussianKernel([ℓ/Δx])

Represents a truncated Gaussian spreading kernel.

Definition

\[ϕ(x) = e^{-x² / 2ℓ²}\]

where $ℓ$ is the characteristic width of the kernel.

Fourier transform

\[ϕ̂(k) = \sqrt{2πℓ²} e^{-ℓ² k² / 2}\]

Parameter selection

By default, given a kernel half-width $M$, an oversampling factor $σ$ and the oversampling grid spacing $Δx$, the characteristic width $ℓ$ is chosen as [1]

\[ℓ² = Δx² \frac{σ}{2σ - 1} \frac{M}{π}\]

This default value can be overriden by explicitly passing a value of the wanted normalised width $ℓ/Δx$.

Implementation details

In the implementation, this kernel is efficiently evaluated using the fast Gaussian gridding method proposed by Greengard & Lee [2].

[1] Potts & Steidl, SIAM J. Sci. Comput. 24, 2013 (2003)
[2] Greengard & Lee, SIAM Rev. 46, 443 (2004)

source
NonuniformFFTs.Kernels.BSplineKernelType
BSplineKernel <: AbstractKernel
BSplineKernel()

Represents a B-spline spreading kernel.

The order $n$ of the B-spline is directly related to the kernel half-width $M$ by $n = 2M$ (the polynomial degree is $n - 1$).

Definition

A B-spline of order $n$ may be defined via its Fourier transform:

\[ϕ̂(k) = Δx \operatorname{sinc}^n \left( \frac{k Δx}{2} \right)\]

where $Δx$ is the spacing of the oversampled grid and $\operatorname{sinc}$ is the unnormalised sinc function.

Implementation details

In the implementation, this kernel is evaluated using de Boor's algorithm.

source

Kernel evaluation methods

NonuniformFFTs.Kernels.FastApproximationType
FastApproximation <: Kernels.EvaluationMode

Use a fast approximation or algorithm to efficiently evaluate a kernel.

The evaluation type depends on the actual kernel:

In the first two cases, this is generally faster than Direct evaluation for CPU transforms.

On GPUs this is not necessarily the case, and results may depend on the actual GPU used. For example, on an Nvidia A100, Direct evaluation of the KaiserBesselKernel (which involves Bessel functions) seems to beat polynomial approximation, and is also a bit faster than the BackwardsKaiserBesselKernel (involving sinh functions).

source

Index