<div dir="ltr">Hi folks,<br><br>The 8th issue of LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella, is now available at <<a href="https://llvm-gpu-news.github.io/2021/03/19/issue-8.html">https://llvm-gpu-news.github.io/2021/03/19/issue-8.html</a>>.<div><br>I'm also pasting the content below, in case you prefer to read in your email client.<br><div><br>-Jakub<br><br>======================================================================<br><br># LLVM GPU News Issue #8, March 19 2021<br>Authors: Jakub Kuderski, Johannes Doerfert, Lei Zhang</div></div><div><br>Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella.<br>This issue covers the period from March 5 to March 18 2021.<br><br>We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.<br><br><br>## Industry News and Conference Talks<br><br><br>##  LLVM and Clang<br><br>### Discussions<br>*  Johannes Doerfert asks about [NVPTX support for llvm math functions](<a href="https://lists.llvm.org/pipermail/llvm-dev/2021-March/149117.html">https://lists.llvm.org/pipermail/llvm-dev/2021-March/149117.html</a>) (e.g., `llvm.sin`). NVPTX does not provide `libc` and `libm`, though some math functions are implemented through the `libdevice` bitcode module. A solution would be to teach Clang or the LLVM middleend how to match `__nv_*` functions to the LLVM ones. Johannes [implemented a prototype](<a href="https://reviews.llvm.org/D98516">https://reviews.llvm.org/D98516</a>) that adds such function mapping support through LLVM IR attributes.<br>*  Jay Foad expressed interest in [using llvm-mca for AMDGPU](<a href="https://lists.llvm.org/pipermail/llvm-dev/2021-March/149068.html">https://lists.llvm.org/pipermail/llvm-dev/2021-March/149068.html</a>) and asked about the difference between `MicroOpBufferSize=0/1`. Based on the response from Andrew Trick, Jay implemented a patch that adds [llvm-mca support for in-order CPUs](<a href="https://reviews.llvm.org/D98356">https://reviews.llvm.org/D98356</a>).<br>*  Konrad Trifunovic [summarized the discussion on upstreaming a SPIR-V backend](<a href="https://lists.llvm.org/pipermail/llvm-dev/2021-March/149175.html">https://lists.llvm.org/pipermail/llvm-dev/2021-March/149175.html</a>) and shared a rough plan with short and long-term objectives.<br>*  Anastasia Stulova summarized the discussion on [a new file extension for C++ OpenCL sources](<a href="https://lists.llvm.org/pipermail/cfe-dev/2021-March/067936.html">https://lists.llvm.org/pipermail/cfe-dev/2021-March/067936.html</a>). The default would be `.clcpp` now, matching the  compiler option `-cl-std=clc++`. The [Phabricator Clang patch](<a href="https://reviews.llvm.org/D96771">https://reviews.llvm.org/D96771</a>) is awaiting any last feedback before committing.<br><br>### Commits<br><br>*  AMDGPU switched from using individual cache operands (GLC, SLC, DLC) to a [single `cache_policy` bitmask operand](<a href="https://reviews.llvm.org/D96469">https://reviews.llvm.org/D96469</a>). This reduces the amount of Machine IR code.<br>*  Fixes for the GFX90a AMDGPU target:<br>   -  [disable lds_direct](<a href="https://reviews.llvm.org/D96469">https://reviews.llvm.org/D96469</a>),<br>   -  [SCC support on buffer atomics](<a href="https://reviews.llvm.org/D98731">https://reviews.llvm.org/D98731</a>).<br>*  Split some of the AMDGPU instructions predicated on the `dot2-insts` target feature [into a new `dot7-insts`](<a href="https://reviews.llvm.org/D98717">https://reviews.llvm.org/D98717</a>), in preparation for subtargets that have some but not all of these instructions.<br>*  SYCL driver options were reworked. A [new language option (`SYCLIsHost`)](<a href="https://reviews.llvm.org/D97717">https://reviews.llvm.org/D97717</a>) is used to identify host executions. `-fsycl` and `-fno-sycl` became driver-only options rejected when passed to `-cc1`.<br>*  (In-review) HIP diagnostic for [aggregate arguments containing half-precision types](<a href="https://reviews.llvm.org/D98143">https://reviews.llvm.org/D98143</a>). GCC and Clang do not have a consistent ABI for half-precision types, so passing these between the two compilers may result in Undefined Behavior.<br><br><br>## MLIR<br><br>### Discussions<br><br>### Commits<br><br>*  CUDA/ROCDL kernel to blob conversion is now in [a pass](<a href="https://reviews.llvm.org/D98279">https://reviews.llvm.org/D98279</a>) registered to `mlir-opt`.<br>*  [`mlir-cuda-runner`](<a href="https://reviews.llvm.org/D98396">https://reviews.llvm.org/D98396</a>) and [`mlir-rocm-runner`](<a href="https://reviews.llvm.org/D98447">https://reviews.llvm.org/D98447</a>) are gone; integration tests now use `mlir-opt` and `mlir-cpu-runner`.<br>*  The SPIR-V dialect sees more ops for Vulkan graphics: [`spv.Image`](<a href="https://reviews.llvm.org/D98270">https://reviews.llvm.org/D98270</a>).<br>*  A few more patches landed into the SPIR-V dialect to improve op naming consistency.<br><br><br>## OpenMP (Target Offloading)<br><br>### Discussions<br> <br> * The redesign of the [memory globalization for GPUs](<a href="https://reviews.llvm.org/D97680">https://reviews.llvm.org/D97680</a>) is making progress. The [original patch](<a href="https://reviews.llvm.org/D90670">https://reviews.llvm.org/D90670</a>) has been refined and entered the testing stage. Alone it will regress performance significantly but it opens the possibility to optimize the code further. The [first optimization](<a href="https://reviews.llvm.org/D97818">https://reviews.llvm.org/D97818</a>) has been approved.<br> * A [redesign of the device runtime](<a href="https://github.com/jdoerfert/llvm-project/tree/feature/openmp_no_dynamic_device_schedule">https://github.com/jdoerfert/llvm-project/tree/feature/openmp_no_dynamic_device_schedule</a>) has been started, based on earlier, smaller patches ([\[1\]](<a href="https://reviews.llvm.org/D98349">https://reviews.llvm.org/D98349</a>), [\[2\]](<a href="https://reviews.llvm.org/D98678)">https://reviews.llvm.org/D98678)</a>). The overall process is not done but the various bugs in our OpenMP handling have been already found: [\[3\]](<a href="https://bugs.llvm.org/show_bug.cgi?id=49649">https://bugs.llvm.org/show_bug.cgi?id=49649</a>), [\[4\]](<a href="https://bugs.llvm.org/show_bug.cgi?id=49636">https://bugs.llvm.org/show_bug.cgi?id=49636</a>), [\[5\]](<a href="https://bugs.llvm.org/show_bug.cgi?id=49468">https://bugs.llvm.org/show_bug.cgi?id=49468</a>).<br><br>### Commits<br><br>*  Initial [support for the OpenMP 5.1 `interop` directive](<a href="https://reviews.llvm.org/D98558">https://reviews.llvm.org/D98558</a>) has been committed. This adds basic parsing/sema/serialization support for [`#pragma omp interop`](<a href="https://www.openmp.org/spec-html/5.1/openmpsu71.html">https://www.openmp.org/spec-html/5.1/openmpsu71.html</a>).<br>*  Only [build one bitcode library for each SM](<a href="https://reviews.llvm.org/D97198">https://reviews.llvm.org/D97198</a>) on NVPTX targets.<br>*  The AMDGPU host plugin is now [built by default](<a href="https://reviews.llvm.org/D98654">https://reviews.llvm.org/D98654</a>).<br>*  The AMDGPU device runtime was briefly [built by default](<a href="https://reviews.llvm.org/D98658">https://reviews.llvm.org/D98658</a>) but there are issues if the AMDGPU target is not available and the patch has been reverted until those are cleared.<br>*  As a middle step in the device runtime redesign we [removed 20% of the memory allocated to support dynamic scheduling](<a href="https://reviews.llvm.org/D98678">https://reviews.llvm.org/D98678</a>) in favor of dynamic allocations. You will notice only the memory savings if you do not run dynamic schedules on the device (which you probably should not).<br><br><br>## External Compilers<br><br>### LLPC<br>*  LLPC switched to using the [upstream LLVM implementation of demote to helper](<a href="https://github.com/GPUOpen-Drivers/llpc/pull/1184">https://github.com/GPUOpen-Drivers/llpc/pull/1184</a>). This is used by the discard-to-demote transformation that allows shaders with the `OpKill` SPIR-V instructions to behave like a helper invocation ([see `OpDemoteToHelperInvocationEXT`](<a href="https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_EXT_shader_demote_to_helper_invocation.html">https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_EXT_shader_demote_to_helper_invocation.html</a>)) instead of terminating the thread.<br><br>### Mesa<br>*  Initial [support for GFX90A](<a href="https://cgit.freedesktop.org/mesa/mesa/log/?qt=grep&q=aldebaran">https://cgit.freedesktop.org/mesa/mesa/log/?qt=grep&q=aldebaran</a>) AMDGPU landed.<br><br>### SYCL<br><br></div>-- <br><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div>Jakub Kuderski</div></div></div>