[llvm-dev] LLVM GPU News #25, December 17 2021

Sun Dec 19 22:09:54 PST 2021

Hi folks,

The 25th issue of LLVM GPU News, a bi-weekly newsletter on all the GPU
things under the LLVM umbrella, is out: <
https://llvm-gpu-news.github.io/2021/12/17/issue-25.html>.
I also paste the content below, in case you prefer to read in your email
client.

LLVM GPU news is taking a holiday break and we will skip the next December
issue. See you next year!

-Jakub

======================================================================

# LLVM GPU News #25, December 17 2021
Authors: Jakub Kuderski, Alexey Bader, Lei Zhang

Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things
under the LLVM umbrella.
This issue covers the period from December 3 to December 16 2021.

We welcome your feedback and suggestions. Let us know if we missed anything
interesting, or want us to bring attention to your (sub)project, revisions
under review, or proposals. Please see the bottom of the page for details
on how to submit suggestions and contribute.

## Industry News and Conferences

##  LLVM and Clang

### Discussions

*  Intel, ARM, and Khronos [proposed to start the integration](
https://lists.llvm.org/pipermail/llvm-dev/2021-December/154270.html) of
[SPIR-V backend](https://github.com/KhronosGroup/LLVM-SPIRV-Backend) to
LLVM. The code has been refactored so it does not require extra changes
from GlobalISel or other LLVM target-independent infrastructure. With some
SPIR-V functionality missing, the new code is proposed to land as an
experimental backend. There are no replies to the proposal at the time of
writing.

### Commits

*  The HIP toolchain is being refactored with SPIR-V generation in mind.
The old toolchain was renamed to `HIPAMDToolChain` and a new one
`HIPSPVToolChain` was added. The new toolchain uses the
SPIRV-LLVM-Translator as the SPIR-V LLVM backend is not ready yet. Next,
the in-review patches will allow emitting SPIR-V binary using `llc`.
[D110549](https://reviews.llvm.org/D110549), [D110618](
https://reviews.llvm.org/D110618), [D110622](
https://reviews.llvm.org/D110622), [D110685](
https://reviews.llvm.org/D110685)
*  Documentation was added for the DWARF location descriptions, as a part
of the ['Dwarf Extensions For Heterogeneous Debugging'](
https://llvm.org/docs/AMDGPUDwarfExtensionsForHeterogeneousDebugging.html)
used by AMDGPU. [D115587](https://reviews.llvm.org/D115587)

## MLIR

### Discussions

*  Thomas Raoux is working on adding [async copy from global to shared
memory to the GPU dialect](
https://llvm.discourse.group/t/modeling-gpu-async-copy-ampere-feature/4924):
'async copy is a new feature in Nvidia Ampere GPUs, but other GPUs are
likely to support similar feature in the future. It works like a DMA
operation that copies data from global to shared memory without going
through register and without blocking the execution unit. The big advantage
is that it saves register usage and it allows aggressive software
pipelining to hide latency.' For more details, see [the blog post](
https://developer.nvidia.com/blog/controlling-data-movement-to-boost-performance-on-ampere-architecture/).
Thomas is seeking feedback on how async copy is modeled before sending out
patches for review.

### Commits

## OpenMP (Target Offloading)

### Discussions

### Commits

*  The new device runtime is now enabled by default for AMDGPU offloading.
[D115157](https://reviews.llvm.org/D115157)

## External Compilers

### LLPC

*  The standalone compiler tool `amdllpc` can now compile multiple shader
stages together without having to provide a `.pipe` file. [LLPC#1591](
https://github.com/GPUOpen-Drivers/llpc/pull/1591)

### oneAPI DPC++

#### CUDA/HIP support

*  Added HIP backend support to [filter selector](
https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/FilterSelector/FilterSelector.adoc)
extension.
*  Fixed infinite loop when parallel_for range exceeds `INT_MAX`.
*  Fixed platform query for USM allocation information.
*  Improved performance on CUDA backed by reducing runtime overhead on
event handling and using more efficient implementation for barrier and
other synchronization built-in functions.

#### SYCL 2020 support

*  Switched blocking USM free for OpenCL GPU devices.
*  Fixed support for classes implicitly converted from items in
`parallel_for`.

#### Non-standard extensions

*  Added support for HW threads per EU device query.
*  Added implementation of discard_events extension enabling optimization
of queue operations.
*  Added experimental FPGA latency control feature.
*  ESIMD: Add support for an arbitrary number of elements to
`simd::copy_from/to`.
*  Fixed `bfloat16` construction from integer.

#### Misc

*  Improved runtime library performance by reducing overhead of XPTI
instrumentation.
*  Made initialization of host task thread pool thread-safe.
*  Multiple GitHub Actions continuous system improvements:
  *  libclc testing added to regular validation.
  *  Build of HIP and CUDA back-ends enabled in nightly Docker containers.
*  Improved runtime error diagnostics by handling
`ZE_RESULT_ERROR_INVALID_ARGUMENT` and out-of-memory Level Zero error codes.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20211220/f67d97d1/attachment-0001.html>