[llvm-dev] LLVM GPU News Issue #9, April 2 2021
Jakub (Kuba) Kuderski via llvm-dev
llvm-dev at lists.llvm.org
Fri Apr 2 07:58:33 PDT 2021
The 9th issue of LLVM GPU News, a bi-weekly newsletter on all
the GPU things under the LLVM umbrella, is now available at <
I also pasted the content below, in case you prefer to read in your email
# LLVM GPU News Issue #9, April 2 2021
Authors: Jakub Kuderski, Johannes Doerfert, Lei Zhang
Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things
under the LLVM umbrella.
This issue covers the period from March 19 to April 1 2021.
We welcome your feedback and suggestions. Let us know if we missed anything
interesting, or want us to bring attention to your (sub)project, revisions
under review, or proposals. Please see the bottom of the page for details
on how to submit suggestions and contribute.
## Industry News and Conference Talks
* [AMD ROCm](https://github.com/RadeonOpenCompute/ROCm) v4.1 has been
## LLVM and Clang
* Discussion on the 'Abstracting over SSA form IRs to implement generic
analyses' RFC has seen some new activity. Sameer Sahasrabuddhe [shared
identified that the main issue is that LLVM IR/MIR basic blocks do not
explicitly track their successors and predecessors. Nicolai Hähnle
[clarified what the most important decisions are](
https://lists.llvm.org/pipermail/llvm-dev/2021-March/149560.html) to move
the proposal forward. In addition, Nicolai noted that changing the
in-memory representation of basic blocks to contain predecessor and
successor vectors would allow terminator instruction to refer to those, and
potentially result in reduced memory usage.
* [AMDGPU PAL usage documentation was updated.](
* (In-review) AMDGPU Machine IR optimization to [remove unnecessary cache
* Conversion to NNVM/ROCL now [uses](https://reviews.llvm.org/D98937) a
data layout entry to specify the bitwidth for index type.
## OpenMP (Target Offloading)
* Nader Al Awar asked about using the [`-fembed-bitcode` Clang option with
OpenMP target offload for CUDA](
are no replies as of writing.
* [Asynchronous offloading bugs](
https://bugs.llvm.org/show_bug.cgi?id=49816) were discovered and are being
discussed on the mailing list and the bugtracker.
* The device runtime for LLVM 12 shows performance regressions, [\[1\]](
https://bugs.llvm.org/show_bug.cgi?id=49752) and [\[2\]](
https://bugs.llvm.org/show_bug.cgi?id=49764), that will be addressed in the
* A rewrite of the device runtime is being tested right now. The first
results look promising with regards to performance and memory usage.
* Issues with Clang's device code generation were detected: [\[1\]](
https://bugs.llvm.org/show_bug.cgi?id=49777), and will be resolved soon.
* OpenMP declare mapper will now pass variable names to the runtime for
* Asynchronous errors reported by the device runtime will be [less
* Failed offloading will not cause an [assertion error](
* Optimization for variable globalization on the device is [already
available](https://reviews.llvm.org/D97818) while we prepare to [switch to
the new system](https://reviews.llvm.org/D97680).
## External Compilers
* [Dave Airlie reported](
that they found lavapipe, the Mesa's CPU-based Vulkan implementation, to be
faster than [SwiftShader](https://github.com/google/swiftshader), a
CPU-based Vulkan implementation from Google. This is based on a set of
randomly picked [Vulkan samples from Sascha Willems](
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev