Using Ports of Call
Ports of call is a header-only library that provides a bit of flexibility for performance portability. At the moment it mainly provides a one-header abstraction to enable or disable Kokkos in a code. However other backends can be added. (If you’re interested in adding a backend, please let us know!)
To include Ports of Call in your project, simply include the directory (e.g., as a submodule) in your include path.
PORTABLE_FUNCTION
: decorators necessary for compiling a kernel functionPORTABLE_INLINE_FUNCTION
: ditto, but for when functions ought to be inlinedPORTABLE_FORCEINLINE_FUNCTION
: forces the compiler to inlinePORTABLE_LAMBDA
: Resolves to aKOKKOS_LAMBDA
or to[=]
depending on context_WITH_KOKKOS_
: Defined if Kokkos is enabled._WITH_CUDA_
: Defined when Cuda is enabledReal
: a typedef to double (default) or float (if you defineSINGLE_PRECISION_ENABLED
)PORTABLE_MALLOC()
,PORTABLE_FREE()
: A wrapper for kokkos_malloc or cudaMalloc, or raw malloc and equivalent free.
At compile time, you define
PORTABILITY_STRATEGY_{KOKKOS,CUDA,NONE}
(if you don’t define it,
it defaults to NONE). The above macros then behave as expected. In
particular, PORTABLE_FUNCTION
and friends resolve to __host__
__device__
decorators as appropriate.
There are several headers in this library, for different use cases.
portability.hpp
portability.hpp
provides the above-mentioned macros for decorating
functions. Also provides loop abstractions that can be leveraged by a
code. These loop abstractions are of the form:
where Function
is a template parameter and should be set to a
functor that takes one index, e.g., an index in an array. For example:
portableFor("Example", 0, 5,
PORTABLE_LAMBDA(int i) {
printf("hello from thread %d\n", i);
});
start
is inclusive, stop
is exclusive. Up to five-dimensional
portableFor
loops are available. For example:
template <typename Function>
void portableFor(const char *name, int startb, int stopb, int starta, int stopa,
int startz, int stopz, int starty, int stopy, int startx,
int stopx, Function function) {
We also provide portableReduce
, however the functionality is very
limited. The syntax is:
template <typename Function, typename T>
void portableReduce(const char *name, int starta, int stopa, int startz,
int stopz, int starty, int stopy, int startx, int stopx,
Function function, T &reduced) {
where Function
now takes as many indices are required and
reduced
as arguments.
Also provided are host to device and device to host memory transfers of the form:
-
void portableCopyToHost(T *const to, T const *const from, size_t const size_bytes)
-
void portableCopyToDevice(T *const to, T const *const from, size_t const size_bytes)
with to being the target location, from being the source location, and size_bytes is the size of the transfer in bytes. This has implemenatations for kokkos and none portability strategies.
It may be useful to query the execution space, for example to know where memory needs to be copied. To this end, a compile-time constant boolean can be queried:
- PortsOfCall::EXECUTION_IS_HOST
which is true if the host execution space can trivially access device memory space. For example, for PORTABILITY_STRATEGY_CUDA, PortsOfCall::EXECUTION_IS_HOST == false.
portable_errors.hpp
portable_errors.hpp
provides error handling that works with
different portability backends, such as Kokkos. We provide several
useful macros. All the macros in this file will print the file and
line number where the macro was called, enabling easier debugging.
The following macros are disabled automaticaly for production
builds (e.g., when the NDEBUG
preprocessor macro is defined):
PORTABLE_REQUIRE(condition, message)
prints an error message and aborts the program (without throwing an exception) if compiled in debug mode andcondition
is not satisfied.PORTABLE_ABORT(message)
prints an error message and aborts the program when compiled in debug mode.PORTABLE_WARN(message)
prints a warning message if compiled in debug mode.PORTABLE_THROW_OR_ABORT(message)
prints an error message and then raises a runtime error ifPORTABILITY_STRATEGY
isNONE
and otherwise aborts the program without an exception. This macro is disabled in production.
Each of the above macros is disabled and becomes a no-op for most builds and only enabled for Debug
builds. However, for each of the above macros there is an equivalent PORTABLE_ALWAYS_*
macro, which always functions and is never a no-op:
PORTABLE_ALWAYS_REQUIRE(condition, message)
prints an error message and aborts the program (without throwing an exception) ifcondition
is not satisfied.PORTABLE_ALWAYS_ABORT(message)
prints an error message and aborts the program.PORTABLE_ALWAYS_WARN(message)
prints a warning message.PORTABLE_ALWAYS_THROW_OR_ABORT(message)
prints an error message and then raises a runtime error ifPORTABILITY_STRATEGY
isNONE
and otherwise aborts the program without an exception.
Additionally the macro
PORTABLE_ERROR_MESSAGE(message, output)
fills an outputchar*
with a useful error message containing the filename and line number where the macro is called. Note there is no bounds checking so you must provide the macro with a sufficiently largechar*
array.
The message
parameter in the above macros can be char*
arrays and string literals on device and additionally accepts std::string
and std::stringstream
on host.
Please note that none of these functions are thread or MPI aware. In a parallel program, the same message may be called many times. Therefore caution should be used with this machinery and you may wish to hide these macros in if statements, for example,
if (rank == 0) PORTABLE_REQUIRE(my_condition, my_message);
as appropriate.
robust_utils.hpp
robust_utils.hpp
contains small utility functions for numerical
robustness, especially around floating point numbers. The available
functionality is contained in the namespace PortsOfCall::Robust
and includes:
constexpr auto SMALL<T>()
returns a small number of typeT
.constexpr auto EPS<T>()
returns a value of typeT
close to machine epsilon.constexpr auto min_exp_arg<T>()
returns the smallest safe value of typeT
to pass into an exponent.constexpr auto max_exp_exp_arg<T>()
returns the max safe value of typeT
to pass into an exponent.auto make_positive(const T val)
makes the argument of typeT
positive.
where here all functionality is templated on type T
and marked
with PORTABLE_INLINE_FUNCTION
. The default type T
is always
Real
.
The function
template <typename T>
PORTABLE_FORCEINLINE_FUNCTION
Real make_bounded(const T val, const T vmin, const T vmax);
bounds val
between vmin
and vmax
, exclusive. Note this is
slightly different than std::clamp
, which uses inclusive bounds.
The function
template <typename T>
PORTABLE_FORCEINLINE_FUNCTION int sgn(const T &val);
returns the sign of a quantity val
.
Note
Note this implementation never returns zero. It always returns \(\pm 1\).
The function
template <typename A, typename B>
PORTABLE_FORCEINLINE_FUNCTION auto ratio(const A &a, const B &b)
computes the ratio \(A/B\) but in a way robust to 0/0 errors. If both \(A\) and \(B\) are zero, this function will return 0. If \(|A| > 0\) and \(B=0\), then it will return a very large, possibly (but not guaranteed to be) infinite number.
The function
template <typename T>
PORTABLE_FORCEINLINE_FUNCTION T safe_arg_exp(const T &x)
returns exponentiation in such a way that avoids floating point
exceptions. For very large negative inputs, it returns 0. For very
large positive ones, it returns
std::numeric_limits<T>::infinity()
.
The function
template <typename T>
PORTABLE_FUNCTION constexpr bool check_nonnegative(const T t)
checks if the value is non-negative (\(t \geq 0\)). There are two
versions: one for signed values (performs the check and returns the result) and
one for unsigned values (simply returns true, since unsigned values can never
be negative). This is typically used in generic code where a value must be
non-negative, but the type is unknown and therefore may be either signed or
unsigned. Simply using t >= 0
can cause undesirable warnings about
unsigned integer comparisons, so check_nonnegative
is provided.
math_utils.hpp
math_utils.hpp
contains math operations intended to be both performant and
portable to GPUs.
The function
template <typename base_t, typename exp_t>
PORTABLE_FUNCTION constexpr inline base_t int_power(base_t base, exp_t exp)
is equivalent to std::pow
except that the exponent is required to be an
integer. For small integer powers, int_power
is faster than std::pow
.
For sufficiently large integer powers, std::pow
may be faster, but testing
indicates int_power
is significantly faster (roughly a factor of two or
better) up to power of at least 100.
The function
template <
typename IterB,
typename IterE,
typename Value,
typename Op = singe::util::plus<Value>>
PORTABLE_FUNCTION constexpr Value accumulate(
IterB begin,
IterE end,
Value accum,
Op && op = singe::util::plus<Value>{})
is a simple constexpr
implementation of std::accumulate
from the STL.
Ports-of-Call also provides a constexpr
implementation of std::plus
(which is the default operator for accumulate
).
macros_arrays.hpp
portable_arrays.hpp
provides a wrapper class, PortableMDArray
,
around a contiguous block of host or device memory that knows stride
and layout, enabling one to mock up multidimensional arrays from a
pointer to memory. The design is heavily inspired by the
AthenaArray
class from Athena++.
One constructs a PortableMDArray
by passing it a pointer to
underlying data and a shape. For example:
#include <portability.hpp>
#include <portable_arrays.hpp>
constexpr int NX = 2;
constexpr int NY = 3;
constexpr int NZ = 4;
Real *data = (Real*)PORTABLE_MALLOC(NX*NY*NZ*sizeof(Real));
PortableMDArray<Real> my_3d_array(data, NZ, NY, NX);
Note
PortableMDArray
is templated on underlying data
type.
Note
PortableMDArray
is column-major-ordered. The
slowest moving index is z
and the fastest is x
.
You can then set or access an element by reference as:
// z = 3, y = 2, x = 1
my_3d_array(3,2,1) = 5.0;
You can always access the “flat” array by simply using the 1D accessor:
my_3d_array(6) = 2.0;
By default PortableMDArray
has reference-semantics. In
other words, copies are shallow.
You can assign new data and a new shape to a PortableMDArray
with
the NewPortableMDArray
function. For example:
my_3d_array.NewPortableArray(new_data, 9, 8, 7);
would reshape my_3d_array
to be of shape 7x8x9 and point it at the
new_data
pointer.
PortableMDArray
also provides a few useful methods:
-
size_t PortableMDArray::GetRank()
provides the number of dimensions of the array.
-
int PortableMDArray::GetDim(size_t i)
returns the size of a given dimension (indexed from 1, not 0).
-
int PortableMDArray::GetSize()
returns the size of the flattened array.
-
size_t PortableMDArray::GetSizeInBytes()
returns the size of the flattened array in bytes.
-
bool PortableMDArray::IsEmpty()
returns true if the array is empty and false otherwise.
-
T *PortableMDArray::data()
returns the underlying pointer. The begin()
and end()
functions return pointers to the beginning and end of the array.
-
void PortableMDArray::Reshape(int nx3, int nx2, int nx1)
resets the shape of the array without pointing to a new underlying data pointer. It accepts anywhere between 1 and 6 sizes.
PortableMDArray
also supports some simple boolean comparitors,
such as ==
and arithmetic such as +
, and -
.
array.hpp
PortsOfCall::array
is intended to be a drop-in replacement for std::array
, with the
exception that it works on GPUs. As of C++17, std::array::fill
and std::array::swap
are
not yet constexpr
, so even with the “relaxed constexpr
” compilation mode std::array
is
not feature-complete on GPUs. This will change when those member functions become constexpr
in
C++20.
span.hpp
PortsOfCall::span
is implements std::span
for C++17 (uses native implmentation in C++20)
as a view over contiguous data. span
may have compile-time static extent, or a dynamic extent.
span
provides iterator functions similar to containers.
int arr[] = {1, 2, 3};
auto s = span{arr};
for(auto & i : s)
{
i -= 1;
}
span::subspan
returns a span over a subrange. Element access uses span::operator[]
. For
more information, see C++ reference page.
static_vector.hpp
PortsOfCall::static_vector
is a GPU-compatible data structure that provides a
std::vector
-like interface, but uses std::array
-like backing storage. That means that the
size is variable, but the capacity is fixed at runtime. This allows the creation of a data
structure of non-default-constructible objects like with a std::vector
. This also allows the
type to be self-contained: no pointers, so a PortsOfCall::static_vector
can be memcopied
between CPU and GPU. It is related to a `proposed data structure
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p0843r8.html`_ that may be included in a
future C++ standard.