<div dir="ltr">Hi folks,<br><br>The 26th issue of LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella, is out: <<a href="https://llvm-gpu-news.github.io/2022/01/14/issue-26.html" target="_blank">https://llvm-gpu-news.github.io/2022/01/14/issue-26.html</a>>.<br>I also paste the content below, in case you prefer to read in your email client.<div><br>-Jakub<br><br>======================================================================<br><br># LLVM GPU News #26, January 14 2021<br>Authors: Jakub Kuderski, Alexey Bader, Joseph Huber, Lei Zhang<br><br>Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella.<br>This issue covers the period from December 17 to January 13 2022.<br><br>We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.<br><br><br>## Industry News and Conferences<br><br>*  The first LLVM GPU Working Group Meeting was on January 14. All people interested in GPU/offloading-related development in LLVM are welcome to join. See the [document with the meeting scheduling information, agenda, and notes](<a href="https://docs.google.com/document/d/1m_oSe1HwtWdQ2JUmMRTAVHbUS7Dv4MRsqptiYcgK6iI">https://docs.google.com/document/d/1m_oSe1HwtWdQ2JUmMRTAVHbUS7Dv4MRsqptiYcgK6iI</a>) for more details.<br>   *  We are running polls on the meeting time and frequency.<br><br><br>##  LLVM and Clang<br><br>### Discussions<br><br>*  Discussion on how to improve math optimizations on GPUs as part of the first [LLVM GPU working group meeting](<a href="https://docs.google.com/document/d/1m_oSe1HwtWdQ2JUmMRTAVHbUS7Dv4MRsqptiYcgK6iI/edit#heading=h.k8ma1s401a1">https://docs.google.com/document/d/1m_oSe1HwtWdQ2JUmMRTAVHbUS7Dv4MRsqptiYcgK6iI/edit#heading=h.k8ma1s401a1</a>).<br>*  Anastasia Stulova posted an [RFC: 'Add linking of separate translation units using `spirv-link`'](<a href="https://lists.llvm.org/pipermail/cfe-dev/2022-January/069658.html">https://lists.llvm.org/pipermail/cfe-dev/2022-January/069658.html</a>). `spirv-link` is an external tool provided by the [SPIRV-Tools project](<a href="https://github.com/KhronosGroup/SPIRV-Tools#linker">https://github.com/KhronosGroup/SPIRV-Tools#linker</a>). This landed as [D116266](<a href="https://reviews.llvm.org/D116266">https://reviews.llvm.org/D116266</a>).<br>*  'huoshanl' asked [why the AMDGPU backend dropped the `GCNRegBankReassign` pass](<a href="https://llvm.discourse.group/t/gcnregbankreassign-pass-issue/5303">https://llvm.discourse.group/t/gcnregbankreassign-pass-issue/5303</a>). There are no replies at the time of writing.<br><br>### Commits<br><br>*  A SPIR-V toolchain was added to Clang. SPIR-V code is generated by the external SPIRV-LLVM translator tool `llvm-spirv`, as a temporary solution until a SPIR-V backend lands in LLVM. [D112410](<a href="https://reviews.llvm.org/D112410">https://reviews.llvm.org/D112410</a>)<br>*  The OpenCL documentation was updated with C++ for OpenCL 2021 support in Clang. [D116271](<a href="https://reviews.llvm.org/D116271">https://reviews.llvm.org/D116271</a>)<br>*  CUDA/HIP now allow `__int128` on the host side, even if not supported by the target device. [D111047](<a href="https://reviews.llvm.org/D111047">https://reviews.llvm.org/D111047</a>)<br>*  NVPTX intrinsics and builtins for CUDA PTX `cvt` instructions were added for sm80 architectures and above. [D116673](<a href="https://reviews.llvm.org/D116673">https://reviews.llvm.org/D116673</a>)<br>*  Enabled AMDGPU divergence predicates for `not`, `min`/`max`, and `ctlz`/`cttz`. [D115884](<a href="https://reviews.llvm.org/D115884">https://reviews.llvm.org/D115884</a>), [D115954](<a href="https://reviews.llvm.org/D115954">https://reviews.llvm.org/D115954</a>), [D116044](<a href="https://reviews.llvm.org/D116044">https://reviews.llvm.org/D116044</a>)<br><br><br>## MLIR<br><br>### Discussions<br><br>### Commits<br><br>*  The `gpu.printf` op is defined to support debugging. [D110448](<a href="https://reviews.llvm.org/D110448">https://reviews.llvm.org/D110448</a>)<br>*  GPU kernel outlining supports the datalayout spec attribute (`dlspec`) now. [D115722](<a href="https://reviews.llvm.org/D115722">https://reviews.llvm.org/D115722</a>)<br>*  A few SPIR-V serialization bugs regarding nested control flows were fixed. [D115560](<a href="https://reviews.llvm.org/D115560">https://reviews.llvm.org/D115560</a>), [D115582](<a href="https://reviews.llvm.org/D115582">https://reviews.llvm.org/D115582</a>)<br>*  SPIR-V serialization allows explicit control over debug information emission. [D115531](<a href="https://reviews.llvm.org/D115531">https://reviews.llvm.org/D115531</a>)<br><br><br>## OpenMP (Target Offloading)<br><br>### Discussions<br><br>*  The implementation of the new OpenMP offloading driver was discussed, [slides](<a href="https://docs.google.com/presentation/d/1QXKSdBWhLaUHyrI-dgd2yHMux3w_q2EF2sROyO0u52k">https://docs.google.com/presentation/d/1QXKSdBWhLaUHyrI-dgd2yHMux3w_q2EF2sROyO0u52k</a>).<br><br>### Commits<br><br>*  100% of OvO tests pass on AMDGPU after applying [D116906](<a href="https://reviews.llvm.org/D116906">https://reviews.llvm.org/D116906</a>) with optimizations.<br>*  The new OpenMP offloading driver is up for review at [D116541](<a href="https://reviews.llvm.org/D116541">https://reviews.llvm.org/D116541</a>) (click 'Stack' to see the full list of revisions) and can be pulled from [here](<a href="https://github.com/jhuber6/llvm-project/tree/NewDriver">https://github.com/jhuber6/llvm-project/tree/NewDriver</a>). It currently supports the following features:<br>   *  Support for AMDGPU and NVPTX offloading targets.<br>   *  New offloading object files are compatible with the host linker.<br>   *  Functional static linking using archive libraries.<br>   *  Device-side LTO for offloading applications.<br>   *  Embedding LLVM IR for future JIT functionality.<br><br><br>## External Compilers<br><br>### LLPC<br><br>*  Added multi-threaded compilation support to the standalone compiler tool `amdllpc`. The main goal is to exercise threading in the compiler in the CI, without having to run the full AMDVLK Vulkan driver on a machine with a GPU. [LLPC#1601](<a href="https://github.com/GPUOpen-Drivers/llpc/pull/1601">https://github.com/GPUOpen-Drivers/llpc/pull/1601</a>)<br>*  A new GitHub Actions workflow was added to automatically produce code coverage reports. Links to coverage reports are posted as Pull Request comments. [See a sample report.](<a href="https://github.com/GPUOpen-Drivers/llpc/pull/1620#issuecomment-1010621514">https://github.com/GPUOpen-Drivers/llpc/pull/1620#issuecomment-1010621514</a>) [LLPC#1627](<a href="https://github.com/GPUOpen-Drivers/llpc/pull/1627">https://github.com/GPUOpen-Drivers/llpc/pull/1627</a>), [LLPC#1629](<a href="https://github.com/GPUOpen-Drivers/llpc/pull/1629">https://github.com/GPUOpen-Drivers/llpc/pull/1629</a>)<br><br>### oneAPI DPC++<br><br>#### CUDA/HIP support<br><br>*  Added group collective functions (reduce, scans, broadcast) for HIP. [#5202](<a href="https://github.com/intel/llvm/pull/5202">https://github.com/intel/llvm/pull/5202</a>)<br>*  Added HIP backend support to filter selector extension. [#5171](<a href="https://github.com/intel/llvm/pull/5171">https://github.com/intel/llvm/pull/5171</a>), [#5176](<a href="https://github.com/intel/llvm/pull/5176">https://github.com/intel/llvm/pull/5176</a>)<br>*  Improved queue barrier support on HIP backend. [#4975](<a href="https://github.com/intel/llvm/pull/4975">https://github.com/intel/llvm/pull/4975</a>)<br>*  Made a number of small functional fixes improving device information, support for stream, hierarchical parallelism, etc. [#4951](<a href="https://github.com/intel/llvm/pull/4951">https://github.com/intel/llvm/pull/4951</a>), [#5168](<a href="https://github.com/intel/llvm/pull/5168">https://github.com/intel/llvm/pull/5168</a>), [#5293](<a href="https://github.com/intel/llvm/pull/5293">https://github.com/intel/llvm/pull/5293</a>), [#5115](<a href="https://github.com/intel/llvm/pull/5115">https://github.com/intel/llvm/pull/5115</a>)<br>*  Added `-fcuda-prec-sqrt` flag enabling correctly rounded results of the `sqrt` function on CUDA backend (equivalent to the nvcc `-prec-sqrt`). [#5141](<a href="https://github.com/intel/llvm/pull/5141">https://github.com/intel/llvm/pull/5141</a>)<br>*  Enabled in-kernel asserts support for the CUDA backend. [#5174](<a href="https://github.com/intel/llvm/pull/5174">https://github.com/intel/llvm/pull/5174</a>)<br>*  Fixed the `nan()` builtin for double types on the CUDA backend. [#5173](<a href="https://github.com/intel/llvm/pull/5173">https://github.com/intel/llvm/pull/5173</a>)<br>*  Fixed out-of-bound behavior for `read_image` in none addressing mode on the CUDA backend. [#5204](<a href="https://github.com/intel/llvm/pull/5204">https://github.com/intel/llvm/pull/5204</a>)<br><br>#### SYCL 2020 support<br><br>*  Added basic support for the `generic_space` address space. [#5148](<a href="https://github.com/intel/llvm/pull/5148">https://github.com/intel/llvm/pull/5148</a>)<br>*  Added property list support to the `stream` class. [#4898](<a href="https://github.com/intel/llvm/pull/4898">https://github.com/intel/llvm/pull/4898</a>)<br><br>#### Non-standard extensions<br><br>*  ESIMD: Added infrastructure to support non-standard C++ types `sycl::half`, `sycl::bfloat`, etc., and basic operations support for `sycl::half`. [#5123](<a href="https://github.com/intel/llvm/pull/5123">https://github.com/intel/llvm/pull/5123</a>)<br>*  ESIMD: Enabled a number of math and conversion intrinsics for `sycl::half`. [#5271](<a href="https://github.com/intel/llvm/pull/5271">https://github.com/intel/llvm/pull/5271</a>)<br>*  Matrix: Enabled `joint_matrix_fill` ([#4994](<a href="https://github.com/intel/llvm/pull/4994">https://github.com/intel/llvm/pull/4994</a>) and [#5277](<a href="https://github.com/intel/llvm/pull/5277">https://github.com/intel/llvm/pull/5277</a>)) and `wi_slice` ([#4979](<a href="https://github.com/intel/llvm/pull/4979">https://github.com/intel/llvm/pull/4979</a>)) operations for `joint_matrix`.<br>*  Added new SPIR-V specification extensions documentation to support "module" debug information ([#3976](<a href="https://github.com/intel/llvm/pull/3976">https://github.com/intel/llvm/pull/3976</a>)) and composite types as joint matrix elements ([#5228](<a href="https://github.com/intel/llvm/pull/5228)">https://github.com/intel/llvm/pull/5228)</a>).<br>*  Added new SYCL specification extensions documentation to support "device global" objects ([#4686](<a href="https://github.com/intel/llvm/pull/4686)">https://github.com/intel/llvm/pull/4686)</a>), compile time properties ([#4937](<a href="https://github.com/intel/llvm/pull/4937)">https://github.com/intel/llvm/pull/4937)</a>), and `std::complex` data type in group collective algorithms ([#5108](<a href="https://github.com/intel/llvm/pull/5108)">https://github.com/intel/llvm/pull/5108)</a>).<br>*  Added initial support for format strings in non-constant address space for the `printf` function. [#5069](<a href="https://github.com/intel/llvm/pull/5069">https://github.com/intel/llvm/pull/5069</a>)<br>*  Moved group sort extension to experimental namespace. [#5169](<a href="https://github.com/intel/llvm/pull/5169">https://github.com/intel/llvm/pull/5169</a>)<br><br>#### Misc<br><br>*  Enhanced SYCL accessor ([#5249](<a href="https://github.com/intel/llvm/pull/5249">https://github.com/intel/llvm/pull/5249</a>)) and buffer ([#5161](<a href="https://github.com/intel/llvm/pull/5161">https://github.com/intel/llvm/pull/5161</a>)) instrumentation with XPTI and reduced XPTI instrumentation overhead [#5158](<a href="https://github.com/intel/llvm/pull/5158">https://github.com/intel/llvm/pull/5158</a>).<br>*  Turned on the `-fsycl-dead-args-optimization` flag by default. [#3004](<a href="https://github.com/intel/llvm/pull/3004">https://github.com/intel/llvm/pull/3004</a>)<br>*  Enabled SPIR-V device image format for fat objects in the driver. [#4608](<a href="https://github.com/intel/llvm/pull/4608">https://github.com/intel/llvm/pull/4608</a>), [#5251](<a href="https://github.com/intel/llvm/pull/5251">https://github.com/intel/llvm/pull/5251</a>), [#4683](<a href="https://github.com/intel/llvm/pull/4683">https://github.com/intel/llvm/pull/4683</a>)<br>*  Made a number of performance improvements in the runtime library and Level Zero plug-in like caching command lists ([#5197](<a href="https://github.com/intel/llvm/pull/5197)">https://github.com/intel/llvm/pull/5197)</a>), batching of copy commands ([#5155](<a href="https://github.com/intel/llvm/pull/5155)">https://github.com/intel/llvm/pull/5155)</a>).<br>*  Added static linking of device code to the Level Zero backend. [#5266](<a href="https://github.com/intel/llvm/pull/5266">https://github.com/intel/llvm/pull/5266</a>), [#5267](<a href="https://github.com/intel/llvm/pull/5267">https://github.com/intel/llvm/pull/5267</a>)<br><br></div></div>