[llvm-dev] LLVM GPU News Issue #4, January 22 2021

Fri Jan 22 09:00:39 PST 2021

Hi folks,

The fourth issue of LLVM GPU News, a bi-weekly newsletter on all the GPU
things under the LLVM umbrella, is now available at:
https://llvm-gpu-news.github.io/2021/01/22/issue-4.html

This release a new section dedicated to OpenMP Target Offloading,
maintained by Johannes Doerfert, while Lei Zhang took on reporting
MLIR/SPIR-V news.

I'm also pasting the content below, in case you prefer to read in your
email client.

-Jakub

======================================================================

# LLVM GPU News Issue #4, January 22 2021

Welcome to the fourth issue of LLVM GPU News, a bi-weekly newsletter on all
the GPU things under the LLVM umbrella.
This issue covers the period from January 8 to January 21 2020.

LLVM GPU News gained a new section: OpenMP Target Offloading, maintained by
Johannes Doerfert.

We welcome your feedback and suggestions. Let us know if we missed anything
interesting, or want us to bring attention to your (sub)project, revisions
under review, or proposals. Please see the bottom of the page for details
on how to submit suggestions and contribute.

## Industry News and Conference Talks

*  Dmitrii Tolmachev published a blog post on [real-time image registration
on GPU with VkFFT](
https://towardsdatascience.com/real-time-image-registration-on-gpu-with-vkfft-library-c4e47f8050a0)
-- a self-made [Vulkan Fast Fourier Transform library](
https://github.com/DTolm/VkFFT). Image registration is the problem of
determining what coordinate system transformation to apply to an image in
order to match it against a different image of the same object. Using a
highly-optimized FFT implementation on a commodity GPU (Nvidia 1660Ti)
allowed Dimitri to run the image registration algorithm in real time.
Matching a pair of 1024x1024 screenshots from Cyberpunk 2077 took around
3ms. The readme on Dimitrii's GitHub mentions that they are looking for a
PhD position or a job.

##  LLVM and Clang

### Discussions

*  The AMDGPU backend is no longer the blocker for switching to the New
Pass Manager. The last failing test was [pinned to use the Legacy Pass
Manager](https://reviews.llvm.org/D95051), while the work on making
Divergence Analysis work with the New Pass Manager [is still in progress](
https://lists.llvm.org/pipermail/llvm-dev/2021-January/147946.html).
*  Burlen Loring asked about [Clang/LLVM and CUDA version compatibility on
Fedora](https://lists.llvm.org/pipermail/cfe-dev/2021-January/067532.html).
There are no replies as of writing.

### Commits

*  (In-review) Add [AMDGPU lower function LDS pass](
https://reviews.llvm.org/D94648). The strategy is to create a new struct
with a field for each LDS variable and allocate that struct at the same
address for every kernel. This allows some OpenMP kernels for AMDGPU to
work with the deviceRTL runtime library that uses CUDA shared variables
from functions that cannot be inlined.
*  `AMDGPUSubtargets.h` was split into two subtargets: [`R600Subtarget.h`
and `GCNSubtarget.h`](https://reviews.llvm.org/D95036). This reduces
include dependencies and improves LLVM build times.
*  (In-review) [Implement HIP codegen support for the `__managed__`
attribute.](https://reviews.llvm.org/D94814) This attribute can be applied
to global variables. Managed variables can be used by both device and host
code. The ROCm programming guide [mentions managed variables as not
supported](
https://rocmdocs.amd.com/en/latest/Programming_Guides/HIP-GUIDE.html#variable-type-qualifiers)
and does not describe their semantics yet.

## MLIR

### Discussions

*  Lenny Guo [expressed intention](
https://llvm.discourse.group/t/generate-spirv-binary-from-mlir-dialect-kernels-to-run-it-on-ocl-runtime/2501/4)
to work on OpenCL conversions via SPIR-V and bring up an mlir-opencl-runner.

### Commits

*  The SPIR-V dialect now knows traits like [`SignedOp`](
https://reviews.llvm.org/D94896), [`UnsignedOp`](
https://reviews.llvm.org/D94068), and [`UsableInSpecConstantOp`](
https://reviews.llvm.org/D94288) to process ops in these categories
uniformly.
*  `spv.SpecConstantOperation` is fully supported now, including
serialization and deserialization.

## OpenMP (Target Offloading)

### Discussions

*  Discussions usually happen on the mailing list (openmp-dev at lists.llvm.org)
or in our weekly ["OpenMP in LLVM" meeting](
https://docs.google.com/document/d/1Tz8WFN13n7yJ-SCE0Qjqf9LmjGUw0dWO9Ts1ss4YOdg/edit?usp=sharing),
feel free to join in!
*  The LLVM/OpenMP documentation has been [online](
https://openmp.llvm.org/docs/index.html) for a few weeks. Initial content
is there but the [FAQ](https://openmp.llvm.org/docs/SupportAndFAQ.html) and
other sections are still very much empty; we are looking for volunteers and
topics.
*  The memory management APIs proposed for OpenMP 6.0, i.a., to allocate
managed memory, are discussed for an (potentially opt-in) inclusion into
the LLVM runtime very soon.

### Commits

*  The `declare mapper` API was the last part of data mapping that did not
transfer source information to the runtime (location and name of the mapped
objects). This was [changed now](https://reviews.llvm.org/D94806) which
will cause [`LIBOMPTARGET_INFO`](
https://openmp.llvm.org/docs/design/Runtimes.html#libomptarget-info) to
display plenty of useful information about mapped objects.
*  The [second patch](https://reviews.llvm.org/D94855) for the OpenMP
target profiling allows us to trace multiple threads that are offloading
concurrently. See the documentation for [`LIBOMPTARGET_PROFILE`](
https://openmp.llvm.org/docs/design/Runtimes.html#libomptarget-profile).
*  Support for an PTX device runtime [has been dropped](
https://reviews.llvm.org/D94725) in favor of the superior way, using an
LLVM-IR device runtime. The latter is now easy to build, simply move
`openmp` from the enabled projects to the enabled runtimes (see [how to
build an OpenMP offloading capable compiler](
https://openmp.llvm.org/docs/SupportAndFAQ.html#q-how-to-build-an-openmp-offload-capable-compiler)
).
*  The `nowait` support for `omp target` directives via ["hidden helper
threads"](https://tianshilei.me/wp-content/uploads/concurrent-lcpc2020.pdf)
was [upstreamed](https://reviews.llvm.org/D77609). Given some problems
encountered afterwards it might need to be refined slightly and might not
make it for LLVM 12 after all.
*  (In-review) [Driver support for OpenMP offloading onto AMD GPUs.](
https://reviews.llvm.org/D94961)
*  (In-review) A series of patches is underway to allow building the device
runtime in pure OpenMP + C++. An overview of the effort can be found [here](
https://reviews.llvm.org/D94745).
*  (In-review) A patch to [build the CUDA plugin without having CUDA
installed](https://reviews.llvm.org/D95155) on the build machine. Together
with a [CUDA free device runtime](https://reviews.llvm.org/D94745) and a
pre-build selection of device runtimes for various architectures, this will
allow us to enable OpenMP offloading in LLVM releases, e.g., via Linux
distributions.

## External Compilers

### LLPC

Lu Jiao added [support for compiling SPIR-V shaders with the linkage
capability](https://github.com/GPUOpen-Drivers/llpc/pull/1110). Such
library SPIR-V shaders do not have an entry function, so it is required to
create a dummy entry function.

### Mesa

### SYCL

-- 
Jakub Kuderski
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210122/1462541a/attachment-0001.html>