[llvm-dev] LLVM GPU News Issue #11, April 30 2021
Jakub (Kuba) Kuderski via llvm-dev
llvm-dev at lists.llvm.org
Fri Apr 30 09:07:13 PDT 2021
The 11th issue of LLVM GPU News, a bi-weekly newsletter on all
the GPU things under the LLVM umbrella, is now available at <
I also pasted the content below, in case you prefer to read in your email
# LLVM GPU News Issue #11, April 30 2021
Authors: Jakub Kuderski, Johannes Doerfert, Lei Zhang
Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things
under the LLVM umbrella.
This issue covers the period from April 16 to April 29 2021.
We welcome your feedback and suggestions. Let us know if we missed anything
interesting, or want us to bring attention to your (sub)project, revisions
under review, or proposals. Please see the bottom of the page for details
on how to submit suggestions and contribute.
## Industry News and Conference Talks
* [CuPy v9 has been released.](
https://medium.com/cupy-team/cupy-v9-is-here-27e9cbfbf7e5) CuPy is a
NumPy-compatible array library accelerated by CUDA. The main highlights are:
- New JIT API for defining CUDA kernels with Python code.
- NVIDIA cuSPARSELt Python bindings to accelerate sparse matrix
multiplication on Nvidia Ampere GPUs.
- AMD ROCm platform improvements, including a binary package for ROCm
* The IWOCL and SYCLcon 2021 conferences happened this week. These
conferences focus on OpenCL and SYCL, respectively. [Video recordings and
presentation slides are already available publicly.](
## LLVM and Clang
* The [discussion on how to allow math functions and intrinsics (and
when compiling for GPUs has been revived.
* New SYCL documentation has been added: ["SYCL Compiler and Runtime
architecture design"](https://reviews.llvm.org/D99488). The initial version
of the document covers address space handling.
* Global Dead Code Elimination [is now scheduled to run before the
Internalization pass](https://reviews.llvm.org/D98783) in the AMDGPU pass
pipeline. This is so that unused global variables, whose only users are
dead, can be internalized.
* HIP gained a new option, `-fgpu-inline-threshold`, that controls the
[inlining threshold for device compilation only](
* Some basic Python support was [added](https://reviews.llvm.org/D101449)
to the GPU dialect and passes.
* Boolean `std.xor` to SPIR-V conversion and `vector<1xT>`
`vector.extract` to SPIR-V conversion were added.
## OpenMP (Target Offloading)
* Pierre Kestener is facing [issues with building NVPTX targets](
upgrading to OpenMP 12. The suggested solution is to install `gcc-multilib`.
* The amdgpu device runtime builds by default and no longer requires LLVM
to have the amdgpu target enabled.
* Simple amdgpu offloading (i.e. if it does not use any libc) now works
out of the box on systems with ROCr (runtime for ROCm) installed.
* Various initial bugs found in the Clang driver, early adopters beware.
* A new clang [tool to list AMD GPUs installed](
https://reviews.llvm.org/D99949), `amdgpu-arch`, was committed. The output
is used to fill `-march` when the latter is not explicitly provided in
`-Xopenmp-target`. This tool is built only if HSA is installed.
* [Simplified clang codegen for parallel regions in OpenMP GPU target
offloading](https://reviews.llvm.org/D95976) and corresponding changes in
libomptarget: SPMD/non-SPMD parallel calls are unified under a single
`kmpc_parallel_51` runtime entry point for parallel regions.
* A [new runtime function `__tgt_set_info_flag`](
https://reviews.llvm.org/D100774) that allows the user to set the
information level at runtime without using the environment variable.
## External Compilers
* It is now possible to [build the `amdllpc` compiler as a standalone
tool](https://github.com/GPUOpen-Drivers/llpc/pull/1217), i.e., without the
rest of the AMDLVK driver.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev