[llvm-dev] LLVM GPU News #22, October 29 2021

Sun Oct 31 14:26:30 PDT 2021

Hi folks,

The 22nd issue of LLVM GPU News, a bi-weekly newsletter on all the GPU
things under the LLVM umbrella, is out: <
https://llvm-gpu-news.github.io/2021/10/29/issue-22.html>.

I also paste the content below, in case you prefer to read in your email
client.

-Jakub

======================================================================

# LLVM GPU News #22, October 29 2021
Authors: Jakub Kuderski, Alexey Bader, Lei Zhang, Joseph Huber

Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things
under the LLVM umbrella.
This issue covers the period from October 15 to October 28 2021.

This issue brings news from a new external project, [oneAPI DPC++](
https://www.intel.com/content/www/us/en/developer/tools/oneapi/data-parallel-c-plus-plus.html),
contributed by Alexey Bader.

We welcome your feedback and suggestions. Let us know if we missed anything
interesting, or want us to bring attention to your (sub)project, revisions
under review, or proposals. Please see the bottom of the page for details
on how to submit suggestions and contribute.

## Industry News and Conferences

##  LLVM and Clang

### Discussions

*  Nimit Singhania [proposed to add two new static analyses to LLVM](
https://lists.llvm.org/pipermail/llvm-dev/2021-October/153412.html) to
detect performance issues in GPU programs, developed as [their PhD thesis](
http://nimitsinghania.com/phd-thesis.pdf). [The first analysis](
https://www.cis.upenn.edu/~alur/Cav17.pdf) detects memory congestion issues
across GPU threads, while [the second](
https://www.cis.upenn.edu/~alur/SAS18.pdf) tells if the block-size
parameter can be tweaked without affecting program correctness. The code is
available on the [GPU Drano project GitHub](
https://github.com/upenn-acg/gpudrano-static-analysis_v1.0). There are no
replies at the time of writing.
*  Jon Chesterfield [observed that "`AMDGPUOpenMP.cpp` in
`Driver/ToolChains` currently spawns an instance of llvm-link to stitch
multiple input files together and splice in ~libm at the same time"](
https://lists.llvm.org/pipermail/cfe-dev/2021-October/069180.html) and is
looking for a solution that would avoid calling llvm-link from the driver.
There are no replies at the time of writing.

### Commits

*  NVPTX now runs a late SROA pass to optimize away more `alloca`s.
[D111471](https://reviews.llvm.org/D111471)
*  AMDGPU now allows the use of a whole register file on gfx90a for VGPRs
(Vector General Purpose Registers) with kernels that do not use AGPRs
([Vector Accumulation Registers](
https://llvm.org/docs/AMDGPUUsage.html#register-identifier)). [D111764](
https://reviews.llvm.org/D111764)

## MLIR

### Discussions

### Commits

*  GPU WMMA ops to NVVM conversion is relaxed to support 64-bit indices.
[D112479](https://reviews.llvm.org/D112479)
*  SPIR-V utility scripts support automatically pulling in OpenCL
definitions from the spec, and a few OpenCL ops were defined. [D111886](
https://reviews.llvm.org/D111886), [D111884](
https://reviews.llvm.org/D111884)

## OpenMP (Target Offloading)

### Discussions

### Commits

*  Improved debugging in the new device runtime and documentation on
enabling it [D112010](https://reviews.llvm.org/D112010). [D112002](
https://reviews.llvm.org/D112002)
*  Fixes and improvements to the new device runtime in preparation for it
to become the default runtime in [D111946](https://reviews.llvm.org/D111946),
[D112144](https://reviews.llvm.org/D112144), and [D112544](
https://reviews.llvm.org/D112544).
*  New device runtime libraries now built for AMDGPU targets in [D112227](
https://reviews.llvm.org/D112227) and [D111987](
https://reviews.llvm.org/D111987).
*  The DeviceRTL library is now built for AMDGPU. [
https://reviews.llvm.org/D112227](https://reviews.llvm.org/D112227)

## External Compilers

### LLPC

*  The New Pass Manager is enabled by default for the frontend passes.
[LLPC#1419](https://github.com/GPUOpen-Drivers/llpc/pull/1419)

### oneAPI DPC++

#### CUDA/HIP support

*  Added Windows platform support for CUDA backend.
*  Fixed `mul_hi` and `frexp` math functions implementation for CUDA
backend.
*  Improved compiler diagnostics for missing libspirv library for CUDA and
HIP backends.
*  Add `get_sub_group_local_id()` to HIP backend.

#### SYCL 2020 support

*  Improved diagnostics for invalid kernel names.
*  Added definitions for missing feature test macros.
*  Fixed a few bugs in specialization constants implementation.
*  Remove program class and related APIs and `half` type declared in the
global namespace.

#### Non-standard extensions

*  Improved `printf` support on devices w/o doubles support.
*  Added support for using `std::tuple` on Intel devices.
*  Added support for `EXT_ONEAPI_max_work_groups` extension adding new
device information descriptors: `max_global_work_groups` and
`max_work_groups`.
*  A number of improvements for Explicit SIMD feature for Intel GPU device
including fixes for SLM gather/scatter, adding support for
`__esimd_svm_block_ld` intrinsic and more.

#### Upstream contributions to LLVM

*  Added support for `sycl_special_class` attribute to address comments
from [D71016](https://reviews.llvm.org/D71016).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20211031/619d84e7/attachment.html>