[llvm-dev] [OPENMP] USM pragma, more than a safety net, its an operational mode.
Gregory Rodgers via llvm-dev
llvm-dev at lists.llvm.org
Fri Sep 3 09:13:10 PDT 2021
OpenMP 5 has a "requires unified_shared_memory" (USM) pragma. The c and
C++ syntax is "#pragma omp requires unified_shared_memory". At a minimum,
this USM pragma tells the compiler that the intended offloading target must
be capable of unified shared memory. If the target does not have this
capability, the compilation and/or runtime should gracefully fail. It is
required that all source compilations in an application have the USM pragma
or none have it. Unified shared memory makes map clauses optional. So if
your program does not have a complete set of map clauses, this safety net
is important.
What is the difference with runtime operation in USM mode? How should map
clauses be handled? : - ignored? , - implement the same copy semantics as
if on a discrete memory GPU? - optimize memory management? In OpenMP
these decisions are up to the implementation. In LLVM, USM is more than
just a compilation and runtime safety net. USM defines a compilation and
runtime-aware mode of operation. In default mode, the map-derived GPU copy
semantics are executed even if the GPU supports USM. In USM mode, the copy
semantics are not executed.
Should programmers delete or avoid the use of map clauses for a USM
application? Absolutely NOT! There are two reasons to continue use of map
clauses in USM mode: performance and portability.
>From a performance perspective the map clauses may trigger more than the
implied copy semantics. They provide important information to the compiler
and runtime regarding if and how variables are accessed in the target
region (on the GPU). This information allows the compiler or runtime to
allocate and/or manage device memory in an optimal fashion.
There are two portability motivations for map clauses. The first is that
the application is portable to accelerators/GPUs that do not provide
unified shared memory. The 2nd is the ability to build and run your
application in default mode on the same USM-capable GPU.
GPU page migration is not necessary for correct OpenMP applications in
default mode. Some GPUs may provide optimizations when GPU page migration
is disabled. Furthermore, the runtime copy semantics of OpenMP map clauses
may be more efficient than automatic page migration in USM mode. This
could include directed prefetch constructs that may be more difficult to
implement with the USM page subsystem.
Summary: In default OpenMP mode, host consistency is established at the
boundaries of target regions via the copy semantics implied by map clauses.
In USM mode, host consistency is established with page migration which
could be less efficient. A colleague reminded me of an old adage that I
like to repeat; “ In HPC, all paging is bad paging”. So for performance
and portability, I strongly recommend OpenMP programmers continue to use
map clauses and test their applications in default mode (without the USM
pragma).
Greg Rodgers
Opinions are my own, not my employer.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210903/fa41137b/attachment.html>
More information about the llvm-dev
mailing list