[Openmp-commits] [openmp] dead50d - [OpenMP][NFC] Fix a few typos in OpenMP documentation

Mon Jul 26 13:04:25 PDT 2021

Author: Joseph Huber
Date: 2021-07-26T16:03:47-04:00
New Revision: dead50d4427cbdd5f41c02c5441270822f702730

URL: https://github.com/llvm/llvm-project/commit/dead50d4427cbdd5f41c02c5441270822f702730
DIFF: https://github.com/llvm/llvm-project/commit/dead50d4427cbdd5f41c02c5441270822f702730.diff

LOG: [OpenMP][NFC] Fix a few typos in OpenMP documentation

Summary:
Fixes some typos in the OpenMP documentation.

Added: 
    

Modified: 
    openmp/docs/remarks/OMP110.rst
    openmp/docs/remarks/OMP111.rst
    openmp/docs/remarks/OMP112.rst
    openmp/docs/remarks/OMP113.rst
    openmp/docs/remarks/OMP120.rst
    openmp/docs/remarks/OMP131.rst
    openmp/docs/remarks/OMP160.rst

Removed: 
    


################################################################################
diff  --git a/openmp/docs/remarks/OMP110.rst b/openmp/docs/remarks/OMP110.rst
index eb1eeba5d8b02..30a67d82e0797 100644

--- a/openmp/docs/remarks/OMP110.rst
+++ b/openmp/docs/remarks/OMP110.rst
@@ -5,11 +5,11 @@ Moving globalized variable to the stack. [OMP110]
 
 This optimization remark indicates that a globalized variable was moved back to
 thread-local stack memory on the device. This occurs when the optimization pass
-can determine that a globalized variable is not possibly be shared between
-threads and globalization was unnecessary. Using stack memory is the best-case
-scenario for data globalization as the variable can now be stored in fast
-register files on the device. This optimization requires full visibility of each
-variable.
+can determine that a globalized variable cannot possibly be shared between
+threads and globalization was ultimately unnecessary. Using stack memory is the
+best-case scenario for data globalization as the variable can now be stored in
+fast register files on the device. This optimization requires full visibility of
+each variable.
 
 Globalization typically occurs when a pointer to a thread-local variable escapes
 the current scope. The compiler needs to be pessimistic and assume that the

diff  --git a/openmp/docs/remarks/OMP111.rst b/openmp/docs/remarks/OMP111.rst
index f6f31fae117b1..127aa5ecb9b63 100644
--- a/openmp/docs/remarks/OMP111.rst
+++ b/openmp/docs/remarks/OMP111.rst
@@ -4,20 +4,21 @@ Replaced globalized variable with X bytes of shared memory. [OMP111]
 ====================================================================
 
 This optimization occurs when a globalized variable's data is shared between
-multiple threads, but requires a static amount of memory that can be determined
-at compile time. This is the case when only a single thread creates the memory
-and then shares is between every thread. The memory can then be pushed to a
-static buffer of shared memory on the device. This optimization allows users to
-declare shared memory on the device without using OpenMP's allocators.
+multiple threads, but requires a constant amount of memory that can be
+determined at compile time. This is the case when only a single thread creates
+the memory and is then shared between every thread. The memory can then be
+pushed to a static buffer of shared memory on the device. This optimization
+allows users to declare shared memory on the device without using OpenMP's
+custom allocators.
 
-Globalization normally occurs when a pointer to a thread-local variables escapes
-the current scope. If a single thread is responsible for creating and sharing
-the data it can instead be mapped directly to shared memory on the target
-device. Checking if only a single thread can execute an instruction requires
+Globalization occurs when a pointer to a thread-local variable escapes the
+current scope. If a single thread is known to be responsible for creating and
+sharing the data it can instead be mapped directly to the device's shared
+memory. Checking if only a single thread can execute an instruction requires
 that the parent functions have internal linkage. Otherwise, an external caller
 could invalidate this analysis but having multiple threads call that function.
-The optimization pass can automatically make internal copied of each function,
-but it is still recommended to mark them as internal using keywords like
+The optimization pass will make internal copies of each function to use for this
+reason, but it is still recommended to mark them as internal using keywords like
 ``static`` whenever possible.
 
 Example
@@ -37,12 +38,12 @@ stencil calculation shows how this can be used.
       for (int j0 = 0; j0 < N; j0 += NC) {
         double sX[MC][NC];
   
-  #pragma omp parallel for collapse(2) default(firstprivate)
+  #pragma omp parallel for collapse(2) shared(sX) default(firstprivate)
         for (int i1 = 0; i1 < MC; ++i1)
           for (int j1 = 0; j1 < NC; ++j1)
             sX[i1][j1] = X[(i0 + i1) * N + (j0 + j1)];
   
-  #pragma omp parallel for collapse(2) default(firstprivate)
+  #pragma omp parallel for collapse(2) shared(sX) default(firstprivate)
         for (int i1 = 1; i1 < MC - 1; ++i1)
           for (int j1 = 1; j1 < NC - 1; ++j1)
             Y[(i0 + i1) * N + j0 * j1] = (sX[i1 + 1][j1] + sX[i1 - 1][j1] +
@@ -60,6 +61,13 @@ stencil calculation shows how this can be used.
       double sX[MC][NC];
              ^
 
+The default mapping for variables captured in an OpenMP parallel region is
+``shared``. This means taking a pointer to the object which will ultimately
+result in globalization that will be mapped to shared memory when it could have
+been placed in registers. To avoid this, make sure each variable that can be
+copied into the region is marked ``firstprivate`` either explicitly or using the
+OpenMP 5.1 feature ``default(firstprivate)``.
+
 Diagnostic Scope
 ----------------
 

diff  --git a/openmp/docs/remarks/OMP112.rst b/openmp/docs/remarks/OMP112.rst
index b671e1807f9b6..870d24bf13c04 100644
--- a/openmp/docs/remarks/OMP112.rst
+++ b/openmp/docs/remarks/OMP112.rst
@@ -6,11 +6,11 @@ Found thread data sharing on the GPU. Expect degraded performance due to data gl
 This missed remark indicates that a globalized value was found on the target
 device that was not either replaced with stack memory by :ref:`OMP110 <omp110>`
 or shared memory by :ref:`OMP111 <omp111>`. Globalization that has not been
-removed will need to be handled by the runtime and will significantly hurt
+removed will need to be handled by the runtime and will significantly impact
 performance. 
 
-The OpenMP standard expects that threads can always share their data between
-each-other.  However, this is not true by default when offloading to a target
+The OpenMP standard requires that threads are able to share their data between
+each-other. However, this is not true by default when offloading to a target
 device such as a GPU. Threads on a GPU cannot shared their data unless it is
 first placed in global or shared memory. In order to create standards complaint
 code, the Clang compiler will globalize any variables that could potentially be

diff  --git a/openmp/docs/remarks/OMP113.rst b/openmp/docs/remarks/OMP113.rst
index 2bd900cb509b3..b599837425809 100644
--- a/openmp/docs/remarks/OMP113.rst
+++ b/openmp/docs/remarks/OMP113.rst
@@ -10,10 +10,10 @@ its pointer cannot be stored. Otherwise it is considered captured and could
 potentially be shared between the threads. This can be overridden using a
 parameter level attribute as suggested in the remark text.
 
-Globalization will occur when a pointer to a thread-local variable escapes
-the current scope. In most cases it can be determined that the variable cannot
-be shared if a copy of its pointer is never made. However, this remark indicates
-a copy of the variable either is present, or is possible because it is used
+Globalization will occur when a pointer to a thread-local variable escapes the
+current scope. In most cases it can be determined that the variable cannot be
+shared if a copy of its pointer is never made. However, this remark indicates a
+copy of the pointer is present or that sharing is possible because it is used
 outside the current translation unit.
 
 Examples

diff  --git a/openmp/docs/remarks/OMP120.rst b/openmp/docs/remarks/OMP120.rst
index 1dc2edaaeed1b..1c3c45f8702e7 100644
--- a/openmp/docs/remarks/OMP120.rst
+++ b/openmp/docs/remarks/OMP120.rst
@@ -4,13 +4,14 @@ Transformed generic-mode kernel to SPMD-mode [OMP120]
 =====================================================
 
 This optimization remark indicates that the execution strategy for the OpenMP
-target offloading kernel was changed. Generic-mode kernels execute by using a
-single thread to schedule parallel worker threads using a state machine. This
+target offloading kernel was changed. Generic-mode kernels are executed by a
+single thread that schedules parallel worker threads using a state machine. This
 code transformation can move a kernel that was initially generated in generic
 mode to SPMD-mode where all threads are active at the same time with no state
-machine. This is only possible if the instructions previously executed by a
-single thread have no side-effects or can be guarded. If the instructions have
-no side-effects they are simply recomputed by each thread.
+machine. This execution strategy is closer to how the threads are actually
+executed on a GPU target. This is only possible if the instructions previously
+executed by a single thread have no side-effects or can be guarded. If the
+instructions have no side-effects they are simply recomputed by each thread.
 
 Generic-mode is often considerably slower than SPMD-mode because of the extra
 overhead required to separately schedule worker threads and pass data between

diff  --git a/openmp/docs/remarks/OMP131.rst b/openmp/docs/remarks/OMP131.rst
index 0864fc8cd7f1a..8b7b999d85369 100644
--- a/openmp/docs/remarks/OMP131.rst
+++ b/openmp/docs/remarks/OMP131.rst
@@ -7,13 +7,16 @@ This optimization remark indicates that a generic-mode kernel on the device was
 specialized for the given target region. When offloading in generic-mode, a
 state machine is required to schedule the work between the parallel worker
 threads. This optimization specializes the state machine in cases where there is
-a known number of parallel regions inside the kernel.
+a known number of parallel regions inside the kernel. A much simpler state
+machine can be used if it is known that there is no nested parallelism and the
+number of regions to schedule is a static amount.
 
 Examples
 --------
 
 This optimization should occur on any generic-mode kernel that has visibility on
-all parallel regions, but cannot be moved to SPMD-mode.
+all parallel regions, but cannot be moved to SPMD-mode and has no nested
+parallelism.
 
 .. code-block:: c++
 

diff  --git a/openmp/docs/remarks/OMP160.rst b/openmp/docs/remarks/OMP160.rst
index 0f5b4eaa00e3f..5012910b18cc3 100644
--- a/openmp/docs/remarks/OMP160.rst
+++ b/openmp/docs/remarks/OMP160.rst
@@ -8,7 +8,7 @@ was not found to have any side-effects. This can occur if the region does not
 write any of its results to memory visible outside the region. This optimization
 is necessary because the barrier between sequential and parallel code typically
 prevents dead code elimination from completely removing the region. Otherwise
-there will still be overhead to fork and merge the threads.
+there will still be overhead to fork and merge the threads with no work done.
 
 Example
 -------