<div dir="ltr">Hi folks,<br><br>The second issue of LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella, is now available at: <a href="https://llvm-gpu-news.github.io/2020/12/25/issue-2.html">https://llvm-gpu-news.github.io/2020/12/25/issue-2.html</a>.<div><br>I'm also pasting the content below, in case you prefer to read in your email client.<br><br>Happy holidays,<br>Jakub<br><br>=====================================================================<br><br># LLVM GPU News Issue #2, December 25 2020<br><br>Welcome to the second issue of LLVM GPU News, a bi-weekly newsletter on all<br>the GPU things under the LLVM umbrella. This issue covers the period from<br>December 11 to December 24 2020.<br><br>We welcome your feedback and suggestions. Let us know if we missed anything<br>interesting, or want us to bring attention to your (sub)project, revisions<br>under review, or proposals. Please see the bottom of the page for details<br>on how to submit suggestions and contribute.<br><br>## Industry News and Conference Talks<br><br>*  [Portable Computing Language (PoCL) v1.6 was released](<a href="http://portablecl.org/pocl-1.6.html">http://portablecl.org/pocl-1.6.html</a>).<br>   The release provides Clang/LLVM 11 support, in addition to the existing<br>   compatibility down to LLVM 6.0. CUDA backed gained several performance<br>   optimizations, including use of 32-bit pointer arithmetic for local<br>   memory, use of static CUDA memory blocks for `__local` blocks and<br>   function arguments, and loop unrolling heuristic tweaks.<br>*  [AMD released ROCm v4.0](<a href="https://github.com/RadeonOpenCompute/ROCm/blob/master/AMD_ROCm_Release_Notes_v4.0.pdf">https://github.com/RadeonOpenCompute/ROCm/blob/master/AMD_ROCm_Release_Notes_v4.0.pdf</a>)<br>   with support for the new MI100 accelerator (HPC GPU) and its new CDNA<br>   architecture. CDNA is based on the previous GCN architecture and<br>   backward-compatible with it.<br><br><br>##  LLVM and Clang<br><br>### Discussions<br><br>*  Nicolai Hähnle posted a proposal on<br>   [abstractions over SSA form IRs to implement generic analyses](<a href="http://lists.llvm.org/pipermail/llvm-dev/2020-December/147433.html">http://lists.llvm.org/pipermail/llvm-dev/2020-December/147433.html</a>).<br>   This is an attempt to restart the discussion over the previously<br>   [proposed CfgTraits abstraction](<a href="http://lists.llvm.org/pipermail/llvm-dev/2020-October/145945.html">http://lists.llvm.org/pipermail/llvm-dev/2020-October/145945.html</a>)<br>   that was [reverted in late October 2020](<a href="https://github.com/llvm/llvm-project/commit/e025d09b216dc2239e1b502f4f277abb6fb4648a">https://github.com/llvm/llvm-project/commit/e025d09b216dc2239e1b502f4f277abb6fb4648a</a>).<br>   The main motivation is to facilitate writing IR-generic analyses that<br>   operate on both CFG and SSA values, e.g., divergence analysis. The RFC<br>   comes with a [detailed overview of the C++ generic programming techniques](<a href="https://docs.google.com/document/d/1sbeGw5uNGFV0ZPVk6h8Q5_dRhk4qFnKHa-uZ-O3c4UY/edit?usp=sharing">https://docs.google.com/document/d/1sbeGw5uNGFV0ZPVk6h8Q5_dRhk4qFnKHa-uZ-O3c4UY/edit?usp=sharing</a>)<br>   and thoughts on how each would fit in the existing LLVM code. Nicolai<br>   implemented his proposed generic abstractions in a few Phabricator<br>   revisions:<br>   *  [D92924: Introduce opaque handles for type erasure.](<a href="https://reviews.llvm.org/D92924">https://reviews.llvm.org/D92924</a>)<br>   *  [D83089: Based on the handle infrastructure, refactor the dominator tree with type-erased base classes.](<a href="https://reviews.llvm.org/D83089">https://reviews.llvm.org/D83089</a>)<br>   *  [D92925: Introduce an SsaContext context class concept for static polymorphism.](<a href="https://reviews.llvm.org/D92925">https://reviews.llvm.org/D92925</a>)<br>   *  [D92926: Introduce an ISsaContext "global" interface class for dynamic polymorphism.](<a href="https://reviews.llvm.org/D92926">https://reviews.llvm.org/D92926</a>)<br>   *  [D83094: Implement a new analysis (cycle info) written generically as non-template code.](<a href="https://reviews.llvm.org/D83094">https://reviews.llvm.org/D83094</a>)<br><br>   There are no replies under the thread as of writing.<br><br>*  Yaxun (Sam) Liu sent and RFC for<br>   [unified offloading option for CUDA/HIP/OpenMP](<a href="http://lists.llvm.org/pipermail/cfe-dev/2020-December/067362.html">http://lists.llvm.org/pipermail/cfe-dev/2020-December/067362.html</a>).<br>   The proposal is to make the clang offloading options more concise by<br>   representing the offloading kind, target, and device architecture with a<br>   new triple, e.g., `-offload=omp-amd-gfx900`, `-offload=hip-amd-gfx906`.<br>   Artem Belevich pointed out that making the naming consistent is<br>   difficult because each offload instance may require an arbitrarily<br>   complex set of options, and options per individual offload instance may<br>   differ minimally. Ben Boeckel is concerned about shell quoting rules<br>   being difficult to get correct with a more complex argument parsing<br>   model.<br><br>### Commits<br><br>*  [Implement SYCL address space attributes handling.](<a href="https://reviews.llvm.org/D89909">https://reviews.llvm.org/D89909</a>)<br>   This addresses issues with pointers that have the same address space in<br>   Clang AST, but different one when lowered to LLVM IR.<br>*  (In review) [Introduce an OpenMP assumption to ignore possible external function callers](<a href="https://reviews.llvm.org/D93079">https://reviews.llvm.org/D93079</a>),<br>   `omp_no_external_caller_in_target_region`. This function attribute is<br>   handled by the existing OpenMP target state machine optimization.<br><br><br>## MLIR<br><br>### Discussions<br><br>* George Mitenkov posted an RFC on<br>  [converting multi-threaded SPIR-V to the LLVM dialect](<a href="https://llvm.discourse.group/t/rfc-converting-multi-threaded-spir-v-to-llvm-dialect-overview/2463">https://llvm.discourse.group/t/rfc-converting-multi-threaded-spir-v-to-llvm-dialect-overview/2463</a>).<br>  After the [initial work during the GSoC project](<a href="https://github.com/georgemitenkov/GSoC-2020">https://github.com/georgemitenkov/GSoC-2020</a>)<br>  that focused on single-threaded code, the next step is to handle<br>  multi-threaded GPU kernels with synchronization so that they can be<br>  compiled for CPU execution. George proposes to map each workgroup to a<br>  CPU thread and subgroups/invocations to SIMD vector operations.<br>  There are no replies under the thread as of writing.<br><br>### Commits<br><br>*  [The SPIR-V dialect learned to convert `select+cmp` into GLSL clamp.](<a href="https://reviews.llvm.org/D93618">https://reviews.llvm.org/D93618</a>)<br>*  SPIR-V (de-)serialization refactoring. The two were<br>   [split into separate libraries](<a href="https://reviews.llvm.org/D91548">https://reviews.llvm.org/D91548</a>) and <br>   de-templated: [\[1\]](<a href="https://reviews.llvm.org/D93535">https://reviews.llvm.org/D93535</a>), [\[2\]](<a href="https://reviews.llvm.org/D93504">https://reviews.llvm.org/D93504</a>).<br><br><br>## External Compilers<br><br>Please submit pointers to your mailing lists, forums, or newsletters if you<br>want your LLVM- or MLIR-based GPU compiler project to be covered in future<br>LLVM GPU News issues.<br><br>### CUDA<br><br>### JuliaGPU<br><br>*  Simone Azeglio and Ayaoyao214 asked about<br>   [overcoming slow scalar operation on GPU arrays](<a href="https://discourse.julialang.org/t/overcoming-slow-scalar-operations-on-gpu-arrays/49554">https://discourse.julialang.org/t/overcoming-slow-scalar-operations-on-gpu-arrays/49554</a>).<br><br>### LLPC<br><br>*  Steven Perron added [support for multi-table descriptor sets](<a href="https://github.com/GPUOpen-Drivers/llpc/pull/1074">https://github.com/GPUOpen-Drivers/llpc/pull/1074</a>).<br>   This allows for experimentation with more flexible binding models than<br>   in the Vulkan API that currently guarantees that all resources in the<br>   same descriptor set will be in the same descriptor table. The patch<br>   lifts this assumption.<br><br>### Mesa<br><br>* Michael Tang [committed initial code for a new `spirv_to_dxil` library](<a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8043">https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8043</a>).<br>  The plan is to use the Mesa code for WebGPU shader compilation for<br>  DirectX, using the following translation path: WGSL (WebGPU Shading<br>  Language) -> SPIR-V (IR from Khronos) -> NIR (Mesa's IR) -> DXIL (DirectX<br>  Intermediate Language).<br><br>### SYCL<br></div><div><div><br></div>-- <br><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div>Jakub Kuderski</div></div></div></div>