<div dir="ltr">Hi folks,<br><br>The 22nd issue of LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella, is out: <<a href="https://llvm-gpu-news.github.io/2021/10/29/issue-22.html" target="_blank">https://llvm-gpu-news.github.io/2021/10/29/issue-22.html</a>>.<br><br>I also paste the content below, in case you prefer to read in your email client.<br><br>-Jakub<br><br>======================================================================<br><br># LLVM GPU News #22, October 29 2021<br>Authors: Jakub Kuderski, Alexey Bader, Lei Zhang, Joseph Huber<br><br>Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella.<br>This issue covers the period from October 15 to October 28 2021.<br><br>This issue brings news from a new external project, [oneAPI DPC++](<a href="https://www.intel.com/content/www/us/en/developer/tools/oneapi/data-parallel-c-plus-plus.html">https://www.intel.com/content/www/us/en/developer/tools/oneapi/data-parallel-c-plus-plus.html</a>), contributed by Alexey Bader.<br><br>We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.<br><br><br>## Industry News and Conferences<br><br><br>##  LLVM and Clang<br><br>### Discussions<br><br>*  Nimit Singhania [proposed to add two new static analyses to LLVM](<a href="https://lists.llvm.org/pipermail/llvm-dev/2021-October/153412.html">https://lists.llvm.org/pipermail/llvm-dev/2021-October/153412.html</a>) to detect performance issues in GPU programs, developed as [their PhD thesis](<a href="http://nimitsinghania.com/phd-thesis.pdf">http://nimitsinghania.com/phd-thesis.pdf</a>). [The first analysis](<a href="https://www.cis.upenn.edu/~alur/Cav17.pdf">https://www.cis.upenn.edu/~alur/Cav17.pdf</a>) detects memory congestion issues across GPU threads, while [the second](<a href="https://www.cis.upenn.edu/~alur/SAS18.pdf">https://www.cis.upenn.edu/~alur/SAS18.pdf</a>) tells if the block-size parameter can be tweaked without affecting program correctness. The code is available on the [GPU Drano project GitHub](<a href="https://github.com/upenn-acg/gpudrano-static-analysis_v1.0">https://github.com/upenn-acg/gpudrano-static-analysis_v1.0</a>). There are no replies at the time of writing.<br>*  Jon Chesterfield [observed that "`AMDGPUOpenMP.cpp` in `Driver/ToolChains` currently spawns an instance of llvm-link to stitch multiple input files together and splice in ~libm at the same time"](<a href="https://lists.llvm.org/pipermail/cfe-dev/2021-October/069180.html">https://lists.llvm.org/pipermail/cfe-dev/2021-October/069180.html</a>) and is looking for a solution that would avoid calling llvm-link from the driver. There are no replies at the time of writing.<br><br>### Commits<br><br>*  NVPTX now runs a late SROA pass to optimize away more `alloca`s. [D111471](<a href="https://reviews.llvm.org/D111471">https://reviews.llvm.org/D111471</a>)<br>*  AMDGPU now allows the use of a whole register file on gfx90a for VGPRs (Vector General Purpose Registers) with kernels that do not use AGPRs ([Vector Accumulation Registers](<a href="https://llvm.org/docs/AMDGPUUsage.html#register-identifier)">https://llvm.org/docs/AMDGPUUsage.html#register-identifier)</a>). [D111764](<a href="https://reviews.llvm.org/D111764">https://reviews.llvm.org/D111764</a>)<br><br><br>## MLIR<br><br>### Discussions<br><br>### Commits<br><br>*  GPU WMMA ops to NVVM conversion is relaxed to support 64-bit indices. [D112479](<a href="https://reviews.llvm.org/D112479">https://reviews.llvm.org/D112479</a>)<br>*  SPIR-V utility scripts support automatically pulling in OpenCL definitions from the spec, and a few OpenCL ops were defined. [D111886](<a href="https://reviews.llvm.org/D111886">https://reviews.llvm.org/D111886</a>), [D111884](<a href="https://reviews.llvm.org/D111884">https://reviews.llvm.org/D111884</a>)<br><br><br>## OpenMP (Target Offloading)<br><br>### Discussions<br><br>### Commits<br><br>*  Improved debugging in the new device runtime and documentation on enabling it [D112010](<a href="https://reviews.llvm.org/D112010">https://reviews.llvm.org/D112010</a>). [D112002](<a href="https://reviews.llvm.org/D112002">https://reviews.llvm.org/D112002</a>)<br>*  Fixes and improvements to the new device runtime in preparation for it to become the default runtime in [D111946](<a href="https://reviews.llvm.org/D111946">https://reviews.llvm.org/D111946</a>), [D112144](<a href="https://reviews.llvm.org/D112144">https://reviews.llvm.org/D112144</a>), and [D112544](<a href="https://reviews.llvm.org/D112544">https://reviews.llvm.org/D112544</a>).<br>*  New device runtime libraries now built for AMDGPU targets in [D112227](<a href="https://reviews.llvm.org/D112227">https://reviews.llvm.org/D112227</a>) and [D111987](<a href="https://reviews.llvm.org/D111987">https://reviews.llvm.org/D111987</a>).<br>*  The DeviceRTL library is now built for AMDGPU. [<a href="https://reviews.llvm.org/D112227](https://reviews.llvm.org/D112227)">https://reviews.llvm.org/D112227](https://reviews.llvm.org/D112227)</a><br><br><br>## External Compilers<br><br>### LLPC<br><br>*  The New Pass Manager is enabled by default for the frontend passes. [LLPC#1419](<a href="https://github.com/GPUOpen-Drivers/llpc/pull/1419">https://github.com/GPUOpen-Drivers/llpc/pull/1419</a>)<br><br>### oneAPI DPC++<br><br>#### CUDA/HIP support<br><br>*  Added Windows platform support for CUDA backend.<br>*  Fixed `mul_hi` and `frexp` math functions implementation for CUDA backend.<br>*  Improved compiler diagnostics for missing libspirv library for CUDA and HIP backends.<br>*  Add `get_sub_group_local_id()` to HIP backend.<br><br>#### SYCL 2020 support<br><br>*  Improved diagnostics for invalid kernel names.<br>*  Added definitions for missing feature test macros.<br>*  Fixed a few bugs in specialization constants implementation.<br>*  Remove program class and related APIs and `half` type declared in the global namespace.<br><br>#### Non-standard extensions<br><br>*  Improved `printf` support on devices w/o doubles support.<br>*  Added support for using `std::tuple` on Intel devices.<br>*  Added support for `EXT_ONEAPI_max_work_groups` extension adding new device information descriptors: `max_global_work_groups` and `max_work_groups`.<br>*  A number of improvements for Explicit SIMD feature for Intel GPU device including fixes for SLM gather/scatter, adding support for `__esimd_svm_block_ld` intrinsic and more.<br><br>#### Upstream contributions to LLVM<br><br>*  Added support for `sycl_special_class` attribute to address comments from [D71016](<a href="https://reviews.llvm.org/D71016">https://reviews.llvm.org/D71016</a>).<br><br></div>