[Openmp-commits] [openmp] c9f3240 - [OpenMP][Documentation] Add OpenMPOpt optimization section
Joseph Huber via Openmp-commits
openmp-commits at lists.llvm.org
Mon Jun 28 14:05:16 PDT 2021
Author: Joseph Huber
Date: 2021-06-28T17:05:03-04:00
New Revision: c9f3240c9d94f8128371358887e8b9e5c10c122a
URL: https://github.com/llvm/llvm-project/commit/c9f3240c9d94f8128371358887e8b9e5c10c122a
DIFF: https://github.com/llvm/llvm-project/commit/c9f3240c9d94f8128371358887e8b9e5c10c122a.diff
LOG: [OpenMP][Documentation] Add OpenMPOpt optimization section
Add some information about the optimizations currently provided by
OpenMPOpt. Every optimization performed should eventually be listed
here.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D105050
Added:
Modified:
openmp/docs/optimizations/OpenMPOpt.rst
Removed:
################################################################################
diff --git a/openmp/docs/optimizations/OpenMPOpt.rst b/openmp/docs/optimizations/OpenMPOpt.rst
index 6606cb9ccd230..6fc942f2c79c1 100644
--- a/openmp/docs/optimizations/OpenMPOpt.rst
+++ b/openmp/docs/optimizations/OpenMPOpt.rst
@@ -1,13 +1,109 @@
+==========================
OpenMP-Aware Optimizations
==========================
+LLVM, since `version 11 <https://releases.llvm.org/download.html#11.0.0>`_ (12
+Oct 2020), supports an :ref:`OpenMP-Aware optimization pass <OpenMPOpt>`. This
+optimization pass will attempt to optimize the module with OpenMP-specific
+domain-knowledge. This pass is enabled by default at high optimization levels
+(O2 / O3) if compiling with OpenMP support enabled.
+
+.. _OpenMPOpt:
+
OpenMPOpt
----------
+=========
+
+.. contents::
+ :local:
+ :depth: 1
+
+OpenMPOpt contains several OpenMP-Aware optimizations. This pass is run early on
+the entire Module, and later on the entire call graph. Most optimizations done
+by OpenMPOpt support remarks. Optimization remarks can be enabled by compiling
+with the following flags.
+
+.. code-block:: console
+
+ $ clang -Rpass=openmp-opt -Rpass-missed=openmp-opt -Rpass-analysis=openmp-opt
+
+OpenMP Runtime Call Deduplication
+---------------------------------
+
+The OpenMP runtime library contains several functions used to implement features
+of the OpenMP standard. Several of the runtime calls are constant within a
+parallel region. A common optimization is to replace invariant code with a
+single reference, but in this case the compiler will only see an opaque call
+into the runtime library. To get around this, OpenMPOpt maintains a list of
+OpenMP runtime functions that are constant and will manually deduplicate them.
+
+Globalization
+-------------
+
+The OpenMP standard requires that data can be shared between
diff erent threads.
+This requirement poses a unique challenge when offloading to GPU accelerators.
+Data cannot be shared between the threads in a GPU by default, in order to do
+this it must either be placed in global or shared memory. This needs to be done
+every time a variable may potentially be shared in order to create correct
+OpenMP programs. Unfortunately, this has significant performance implications
+and is not needed in the majority of cases. For example, when Clang is
+generating code for this offloading region, it will see that the variable `x`
+escapes and is potentially shared. This will require globalizing the variable,
+which means it cannot reside in the registers on the device.
+
+.. code-block:: c++
+
+ void use(void *) { }
+
+ void foo() {
+ int x;
+ use(&x);
+ }
+
+ int main() {
+ #pragma omp target parallel
+ foo();
+ }
+
+In many cases, this transformation is not actually necessary but still carries a
+significant performance penalty. Because of this, OpenMPOpt can perform and
+inter-procedural optimization and scan each known usage of the globalized
+variable and determine if it is potentially captured and shared by another
+thread. If it is not actually captured, it can safely be moved back to fast
+register memory.
+
+Another case is memory that is intentionally shared between the threads, but is
+shared from one thread to all the others. Such variables can be moved to shared
+memory when compiled without needing to go through the runtime library. This
+allows for users to confidently declare shared memory on the device without
+needing to use custom OpenMP allocators or rely on the runtime.
+
+
+.. code-block:: c++
+
+ static void share(void *);
+
+ static void foo() {
+ int x[64];
+ #pragma omp parallel
+ share(x);
+ }
+
+ int main() {
+ #pragma omp target
+ foo();
+ }
+These optimizations can have very large performance implications. Both of these
+optimizations rely heavily on inter-procedural analysis. Because of this,
+offloading applications should ideally be contained in a single translation unit
+and functions should not be externally visible unless needed. OpenMPOpt will
+inform the user if any globalization calls remain if remarks are enabled. This
+should be treated as a defect in the program.
Resources
----------
+=========
+- 2021 OpenMP Webinar: "A Compiler's View of OpenMP" https://youtu.be/eIMpgez61r4
- 2020 LLVM Developers’ Meeting: "(OpenMP) Parallelism-Aware Optimizations" https://youtu.be/gtxWkeLCxmU
- 2019 EuroLLVM Developers’ Meeting: "Compiler Optimizations for (OpenMP) Target Offloading to GPUs" https://youtu.be/3AbS82C3X30
More information about the Openmp-commits
mailing list