r338049 - [OPENMP] What's new for OpenMP in clang.

Sun Jul 29 09:41:21 PDT 2018

I just noticed that UsersManual says: "Clang supports all OpenMP 3.1 
directives and clauses." Maybe this should link to OpenMPSupport?

On 2018-07-26 19:53, Alexey Bataev via cfe-commits wrote:
> Author: abataev
> Date: Thu Jul 26 10:53:45 2018
> New Revision: 338049
> 
> URL: http://llvm.org/viewvc/llvm-project?rev=338049&view=rev
> Log:
> [OPENMP] What's new for OpenMP in clang.
> 
> Updated ReleaseNotes + Status of the OpenMP support in clang.
> 
> Modified:
>     cfe/trunk/docs/OpenMPSupport.rst
>     cfe/trunk/docs/ReleaseNotes.rst
> 
> Modified: cfe/trunk/docs/OpenMPSupport.rst
> URL:
> http://llvm.org/viewvc/llvm-project/cfe/trunk/docs/OpenMPSupport.rst?rev=338049&r1=338048&r2=338049&view=diff
> ==============================================================================
> --- cfe/trunk/docs/OpenMPSupport.rst (original)
> +++ cfe/trunk/docs/OpenMPSupport.rst Thu Jul 26 10:53:45 2018
> @@ -10,13 +10,15 @@
>  .. role:: partial
>  .. role:: good
> 
> +.. contents::
> +   :local:
> +
>  ==================
>  OpenMP Support
>  ==================
> 
> -Clang fully supports OpenMP 3.1 + some elements of OpenMP 4.5. Clang
> supports offloading to X86_64, AArch64 and PPC64[LE] devices.
> -Support for Cuda devices is not ready yet.
> -The status of major OpenMP 4.5 features support in Clang.
> +Clang fully supports OpenMP 4.5. Clang supports offloading to X86_64, 
> AArch64,
> +PPC64[LE] and has `basic support for Cuda devices`_.
> 
>  Standalone directives
>  =====================
> @@ -35,7 +37,7 @@ Standalone directives
> 
>  * #pragma omp target: :good:`Complete`.
> 
> -* #pragma omp declare target: :partial:`Partial`.  No full codegen 
> support.
> +* #pragma omp declare target: :good:`Complete`.
> 
>  * #pragma omp teams: :good:`Complete`.
> 
> @@ -64,5 +66,66 @@ Combined directives
> 
>  * #pragma omp target teams distribute parallel for [simd]: 
> :good:`Complete`.
> 
> -Clang does not support any constructs/updates from upcoming OpenMP
> 5.0 except for `reduction`-based clauses in the `task` and
> `target`-based directives.
> -In addition, the LLVM OpenMP runtime `libomp` supports the OpenMP
> Tools Interface (OMPT) on x86, x86_64, AArch64, and PPC64 on Linux,
> Windows, and mac OS.
> +Clang does not support any constructs/updates from upcoming OpenMP 5.0 
> except
> +for `reduction`-based clauses in the `task` and `target`-based 
> directives.
> +
> +In addition, the LLVM OpenMP runtime `libomp` supports the OpenMP 
> Tools
> +Interface (OMPT) on x86, x86_64, AArch64, and PPC64 on Linux,
> Windows, and mac OS.
> +ows, and mac OS.
> +
> +.. _basic support for Cuda devices:
> +
> +Cuda devices support
> +====================
> +
> +Directives execution modes
> +--------------------------
> +
> +Clang code generation for target regions supports two modes: the SPMD 
> and
> +non-SPMD modes. Clang chooses one of these two modes automatically 
> based on the
> +way directives and clauses on those directives are used. The SPMD mode 
> uses a
> +simplified set of runtime functions thus increasing performance at the 
> cost of
> +supporting some OpenMP features. The non-SPMD mode is the most generic 
> mode and
> +supports all currently available OpenMP features. The compiler will 
> always
> +attempt to use the SPMD mode wherever possible. SPMD mode will not be 
> used if:
> +
> +   - The target region contains an `if()` clause that refers to a 
> `parallel`
> +     directive.
> +
> +   - The target region contains a `parallel` directive with a 
> `num_threads()`
> +     clause.
> +
> +   - The target region contains user code (other than OpenMP-specific
> +     directives) in between the `target` and the `parallel` 
> directives.
> +
> +Data-sharing modes
> +------------------
> +
> +Clang supports two data-sharing models for Cuda devices: `Generic` and 
> `Cuda`
> +modes. The default mode is `Generic`. `Cuda` mode can give an 
> additional
> +performance and can be activated using the `-fopenmp-cuda-mode` flag. 
> In
> +`Generic` mode all local variables that can be shared in the parallel 
> regions
> +are stored in the global memory. In `Cuda` mode local variables are 
> not shared
> +between the threads and it is user responsibility to share the 
> required data
> +between the threads in the parallel regions.
> +
> +Features not supported or with limited support for Cuda devices
> +---------------------------------------------------------------
> +
> +- Reductions across the teams are not supported yet.
> +
> +- Cancellation constructs are not supported.
> +
> +- Doacross loop nest is not supported.
> +
> +- User-defined reductions are supported only for trivial types.
> +
> +- Nested parallelism: inner parallel regions are executed 
> sequentially.
> +
> +- Static linking of libraries containing device code is not supported 
> yet.
> +
> +- Automatic translation of math functions in target regions to 
> device-specific
> +  math functions is not implemented yet.
> +
> +- Debug information for OpenMP target regions is not supported yet.
> +
> 
> Modified: cfe/trunk/docs/ReleaseNotes.rst
> URL:
> http://llvm.org/viewvc/llvm-project/cfe/trunk/docs/ReleaseNotes.rst?rev=338049&r1=338048&r2=338049&view=diff
> ==============================================================================
> --- cfe/trunk/docs/ReleaseNotes.rst (original)
> +++ cfe/trunk/docs/ReleaseNotes.rst Thu Jul 26 10:53:45 2018
> @@ -216,7 +216,21 @@ OpenCL C Language Changes in Clang
>  OpenMP Support in Clang
>  ----------------------------------
> 
> -- ...
> +- Clang gained basic support for OpenMP 4.5 offloading for NVPTX 
> target.
> +   To compile your program for NVPTX target use the following options:
> +   `-fopenmp -fopenmp-targets=nvptx64-nvidia-cuda` for 64 bit 
> platforms or
> +   `-fopenmp -fopenmp-targets=nvptx-nvidia-cuda` for 32 bit platform.
> +
> +- Passing options to the OpenMP device offloading toolchain can be 
> done using
> +  the `-Xopenmp-target=<triple> -opt=val` flag. In this way the 
> `-opt=val`
> +  option will be forwarded to the respective OpenMP device offloading 
> toolchain
> +  described by the triple. For example passing the compute capability 
> to
> +  the OpenMP NVPTX offloading toolchain can be done as follows:
> +  `-Xopenmp-target=nvptx62-nvidia-cuda -march=sm_60`. For the case
> when only one
> +  target offload toolchain is specified under the 
> `-fopenmp-targets=<triples>`
> +  option, then the triple can be skipped: `-Xopenmp-target 
> -march=sm_60`.
> +
> +- Other bugfixes.
> 
>  CUDA Support in Clang
>  ---------------------
> 
> 
> _______________________________________________
> cfe-commits mailing list
> cfe-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits