<div dir="ltr">Hi folks,<br><br>The 17th issue of LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella, is out:<br><<a href="https://llvm-gpu-news.github.io/2021/08/06/issue-17.html">https://llvm-gpu-news.github.io/2021/08/06/issue-17.html</a>>.<br><br>I also pasted the content below, in case you prefer to read in your email client.<br><br>-Jakub<br><br>======================================================================<br><br># LLVM GPU News #17, Aug 6 2021<br>Authors: Jakub Kuderski, Lei Zhang<br clear="all"><div><br>Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella.<br>This issue covers the period from July 23 to August 5 2021.<br><br>We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.<br><br><br>## Industry News and Conferences<br><br>*  The talk schedule for the [2021 X.Org Developers Conference](<a href="https://indico.freedesktop.org/event/1/page/1-overview">https://indico.freedesktop.org/event/1/page/1-overview</a>) is available. This year, the conference is virtual and happens on September 15-17. The compiler/GPU-related talks include:<br>   -  [SSA-based Register Allocation for GPU Architectures](<a href="https://indico.freedesktop.org/event/1/contributions/7/">https://indico.freedesktop.org/event/1/contributions/7/</a>). The talk will be followed by an [in-depth workshop on GPU register allocation](<a href="https://indico.freedesktop.org/event/1/contributions/1/">https://indico.freedesktop.org/event/1/contributions/1/</a>).<br>   -  [The Occult and the Apple GPU](<a href="https://indico.freedesktop.org/event/1/contributions/10/">https://indico.freedesktop.org/event/1/contributions/10/</a>)<br>   -  [Compiling Vulkan shaders in the browser: A tale of control flow graphs and WebAssembly](<a href="https://indico.freedesktop.org/event/1/contributions/22/">https://indico.freedesktop.org/event/1/contributions/22/</a>)<br>   -  [Ray-tracing in Vulkan pt. 2: Implementation](<a href="https://indico.freedesktop.org/event/1/contributions/17/">https://indico.freedesktop.org/event/1/contributions/17/</a>)<br>   -  [Redefining the Future of Accelerator Computing with Level Zero](<a href="https://indico.freedesktop.org/event/1/contributions/13/">https://indico.freedesktop.org/event/1/contributions/13/</a>)<br><br><br>##  LLVM and Clang<br><br>### Discussions<br><br>*  Luke Kenneth Casson Leighton posted an RFC: ["Vector/SIMD ISA Context Abstraction"](<a href="https://lists.llvm.org/pipermail/llvm-dev/2021-July/152008.html">https://lists.llvm.org/pipermail/llvm-dev/2021-July/152008.html</a>). Luke is working on SVP64 Cray-like Vector Extensions for the Power ISA, which is being designed for Hybrid CPU, VPU and 3D GPU workloads. One of the problems mentioned is that some ISA designs may lead to combinatorial explosion in the number of intrinsics, which can be avoided by "separating out 'scalar base' from 'augmentation' throughout the IR". [Renato Golin replied](<a href="https://lists.llvm.org/pipermail/llvm-dev/2021-August/152025.html">https://lists.llvm.org/pipermail/llvm-dev/2021-August/152025.html</a>) that, historically, LLVM tried to keep as many instructions as native IR as possible to avoid the explosion of intrinsics. However, intrinsics tend to reduce the number of program instructions, so there needs to be some balance.<br>*  席致寧 asked about a quick way to [add a new instruction to generated PTX files](<a href="https://lists.llvm.org/pipermail/llvm-dev/2021-August/152047.html">https://lists.llvm.org/pipermail/llvm-dev/2021-August/152047.html</a>), without having to implement full support for the new instruction in the backend. There are no replies as of writing.<br><br>### Commits<br><br>*  HIP switched to using DWARF version 5 by default. [D107190](<a href="https://reviews.llvm.org/D107190">https://reviews.llvm.org/D107190</a>)<br>*  It is now possible to force-enable `MemCpyOpt` with a new LLVM flag `-enable-memcpyopt-without-libcalls`. For now, only the CUDA frontend opts into it, to better exercise this optimization. [D106401](<a href="https://reviews.llvm.org/D106401">https://reviews.llvm.org/D106401</a>)<br>*  A new Attributor pass for deducing AMDGPU-specific attributes was added. [D104997](<a href="https://reviews.llvm.org/D104997">https://reviews.llvm.org/D104997</a>)<br>*  The NVPTX matrix operation intrinsics were extended with the [`ldmatrix.sync.aligned` warp-level matrix load instructions](<a href="https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-ldmatrix">https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-ldmatrix</a>) introduced in PTX 6.5. [D107046](<a href="https://reviews.llvm.org/D107046">https://reviews.llvm.org/D107046</a>)<br>*  Clang learned to preserve ASAN library functions when targeting HIP. [D106315](<a href="https://reviews.llvm.org/D106315">https://reviews.llvm.org/D106315</a>)<br>*  A number of `GlobalISel` enhancements for AMDGPU.<br><br><br>## MLIR<br><br>### Discussions<br><br>### Commits<br><br>*  A `populateGpuToLLVMConversionPatterns` entry point is added for collecting all LLVM GPU to LLVM conversion patterns. [D107218](<a href="https://reviews.llvm.org/D107218">https://reviews.llvm.org/D107218</a>) <br>*  Two boolean [loading](<a href="https://reviews.llvm.org/D107119)/[storing](https://reviews.llvm.org/D107114">https://reviews.llvm.org/D107119)/[storing](https://reviews.llvm.org/D107114</a>) issues were fixed in SPIR-V conversion.<br>*  A few issues in the SPIR-V module combiner were fixed. [D106886](<a href="https://reviews.llvm.org/D106886">https://reviews.llvm.org/D106886</a>)<br>*  [MemRef](<a href="https://reviews.llvm.org/D107094)/[Math](https://reviews.llvm.org/D107093">https://reviews.llvm.org/D107094)/[Math](https://reviews.llvm.org/D107093</a>) to SPIR-V conversions are split into their own directories and files. <br><br><br>## OpenMP (Target Offloading)<br><br>### Discussions<br><br>*  Andrew Marshall is having [issues with building LLVM 12 for OpenMP](<a href="https://llvm.discourse.group/t/problem-compiling-openmp-project-for-llvm-12/4034">https://llvm.discourse.group/t/problem-compiling-openmp-project-for-llvm-12/4034</a>). There are no replies as of writing.<br><br>### Commits<br><br>*  Users can now enable the [new experimental device runtime](<a href="https://lists.llvm.org/pipermail/openmp-dev/2021-August/004073.html">https://lists.llvm.org/pipermail/openmp-dev/2021-August/004073.html</a>) library by passing the `-fopenmp-target-new-runtime` flag. [D106793](<a href="https://reviews.llvm.org/D106793">https://reviews.llvm.org/D106793</a>)<br>*  Linking of match libraries is now supported for AMDGPU when `-lm` is specified. [D104904](<a href="https://reviews.llvm.org/D104904">https://reviews.llvm.org/D104904</a>), [D105981](<a href="https://reviews.llvm.org/D105981">https://reviews.llvm.org/D105981</a>)<br><br><br>## External Compilers<br><br>### LLPC<br><br>### Mesa<br>*  LLVMpipe gained a [linear rasterizer optimized for 2D rendering](<a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11969">https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11969</a>). The changes yield a 2x to 3x performance improvement for 2D workloads.<br><br><br></div>-- <br><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div>Jakub Kuderski</div></div></div>