<div dir="ltr">Hi folks,<br><br>The 11th issue of LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella, is now available at <<a href="https://llvm-gpu-news.github.io/2021/04/30/issue-11.html" target="_blank">https://llvm-gpu-news.github.io/2021/04/30/issue-11.html</a>>.<div><br>I also pasted the content below, in case you prefer to read in your email client.<br><div><br>-Jakub</div></div><div><br></div><div>======================================================================<br><br># LLVM GPU News Issue #11, April 30 2021<br>Authors: Jakub Kuderski, Johannes Doerfert, Lei Zhang<br><br>Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella.<br>This issue covers the period from April 16 to April 29 2021.<br><br>We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute.<br><br><br>## Industry News and Conference Talks<br><br>*  [CuPy v9 has been released.](<a href="https://medium.com/cupy-team/cupy-v9-is-here-27e9cbfbf7e5">https://medium.com/cupy-team/cupy-v9-is-here-27e9cbfbf7e5</a>) CuPy is a NumPy-compatible array library accelerated by CUDA. The main highlights are:<br>    - New JIT API for defining CUDA kernels with Python code.<br>    - NVIDIA cuSPARSELt Python bindings to accelerate sparse matrix multiplication on Nvidia Ampere GPUs.<br>    - AMD ROCm platform improvements, including a binary package for ROCm 4.0.<br>*  The IWOCL and SYCLcon 2021 conferences happened this week. These conferences focus on OpenCL and SYCL, respectively. [Video recordings and presentation slides are already available publicly.](<a href="https://www.iwocl.org/iwocl-2021/conference-program/">https://www.iwocl.org/iwocl-2021/conference-program/</a>)<br><br>##  LLVM and Clang<br><br>### Discussions<br><br>*  The [discussion on how to allow math functions and intrinsics (and friends)](<a href="https://lists.llvm.org/pipermail/llvm-dev/2021-April/150265.html">https://lists.llvm.org/pipermail/llvm-dev/2021-April/150265.html</a>) when compiling for GPUs has been revived.<br><br>### Commits<br><br>*  New SYCL documentation has been added: ["SYCL Compiler and Runtime architecture design"](<a href="https://reviews.llvm.org/D99488">https://reviews.llvm.org/D99488</a>). The initial version of the document covers address space handling.<br>*  Global Dead Code Elimination [is now scheduled to run before the Internalization pass](<a href="https://reviews.llvm.org/D98783">https://reviews.llvm.org/D98783</a>) in the AMDGPU pass pipeline. This is so that unused global variables, whose only users are dead, can be internalized.<br>*  HIP gained a new option, `-fgpu-inline-threshold`, that controls the [inlining threshold for device compilation only](<a href="https://reviews.llvm.org/D99233">https://reviews.llvm.org/D99233</a>).<br><br>## MLIR<br><br>### Discussions<br><br>### Commits<br><br>*  Some basic Python support was [added](<a href="https://reviews.llvm.org/D101449">https://reviews.llvm.org/D101449</a>) to the GPU dialect and passes.<br>*  Boolean `std.xor` to SPIR-V conversion and `vector<1xT>` `vector.extract` to SPIR-V conversion were added.<br><br><br>## OpenMP (Target Offloading)<br><br>### Discussions<br><br>*  Pierre Kestener is facing [issues with building NVPTX targets](<a href="https://lists.llvm.org/pipermail/llvm-dev/2021-April/150275.html">https://lists.llvm.org/pipermail/llvm-dev/2021-April/150275.html</a>) after upgrading to OpenMP 12. The suggested solution is to install `gcc-multilib`.<br>*  The amdgpu device runtime builds by default and no longer requires LLVM to have the amdgpu target enabled.<br>*  Simple amdgpu offloading (i.e. if it does not use any libc) now works out of the box on systems with ROCr (runtime for ROCm) installed.<br>*  Various initial bugs found in the Clang driver, early adopters beware.<br><br>### Commits<br><br>*  A new clang [tool to list AMD GPUs installed](<a href="https://reviews.llvm.org/D99949">https://reviews.llvm.org/D99949</a>), `amdgpu-arch`, was committed. The output is used to fill `-march` when the latter is not explicitly provided in `-Xopenmp-target`. This tool is built only if HSA is installed.<br>*  [Simplified clang codegen for parallel regions in OpenMP GPU target offloading](<a href="https://reviews.llvm.org/D95976">https://reviews.llvm.org/D95976</a>) and corresponding changes in libomptarget: SPMD/non-SPMD parallel calls are unified under a single `kmpc_parallel_51` runtime entry point for parallel regions.<br>*  A [new runtime function `__tgt_set_info_flag`](<a href="https://reviews.llvm.org/D100774">https://reviews.llvm.org/D100774</a>) that allows the user to set the information level at runtime without using the environment variable.<br><br><br>## External Compilers<br><br>### LLPC<br><br>*  It is now possible to [build the `amdllpc` compiler as a standalone tool](<a href="https://github.com/GPUOpen-Drivers/llpc/pull/1217">https://github.com/GPUOpen-Drivers/llpc/pull/1217</a>), i.e., without the rest of the AMDLVK driver.<br><br>### Mesa<br><br></div><div><br></div>-- <br><div dir="ltr" data-smartmail="gmail_signature"><div>Jakub Kuderski</div></div></div>