[llvm-dev] LLVM GPU News Issue #15, July 02 2021
Jakub (Kuba) Kuderski via llvm-dev
llvm-dev at lists.llvm.org
Sun Jul 4 11:34:18 PDT 2021
The 15th issue of LLVM GPU News, a bi-weekly newsletter on all
the GPU things under the LLVM umbrella, is out:
I also pasted the content below, in case you prefer to read in your email
# LLVM GPU News Issue #15, July 02 2021
Authors: Jakub Kuderski
Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things
under the LLVM umbrella.
This issue covers the period from June 18 to July 1 2021.
We welcome your feedback and suggestions. Let us know if we missed anything
interesting, or want us to bring attention to your (sub)project, revisions
under review, or proposals. Please see the bottom of the page for details
on how to submit suggestions and contribute.
## Industry News and Conferences
## LLVM and Clang
* New NVPTX intrinsics and builtins for CUDA PTX 6.5 and 7.0 matrix
operations: `wmma.load`, `wmma.store`, `wmma.mma`, and `mma`. [D104847](
* AMDGPU learned to optimize VGPR live-ranges in simple divergent if-else
* New AMDGPU target: gfx1035. [D104804](https://reviews.llvm.org/D104804)
* AMDGPU gfx90a memory model has been updated. [D105137](
* New 224-bit vector types for AMDGPU. These map to `v7i32`/`v7f32`, while
existing 192-bit types to newly added `v3i64`/`v3f64`/`v6i32`/`v6f32`.
* The `ReplaceLDS` AMDGPU pass is now disabled by default in preparation
to later remove the code. [D104962](https://reviews.llvm.org/D104962)
* New NVPTX ops for warp synchronous matrix operations for the GPU and
NNVM dialects. [D95330](https://reviews.llvm.org/D95330), [D95331](
## OpenMP (Target Offloading)
* Multiple globalization improvements:
* GPU memory globalization got simplified. The old implementation in
the frontend that emulated standard CPU stack sharing is now replaced with
a single allocation command, mimicking an `alloca` instruction for
variables that must be shared between threads. [D97680](
* OpenMP device routines will be internalized to facilitate
interprocedural optimizations. [D102824](https://reviews.llvm.org/D102824)
* The number of Attributor iterations is doubled from 64 to 128 on the
GPU target. [D104920](https://reviews.llvm.org/D104920)
* Remaining globalization optimizations will be reported as missed
remarks instead of analysis remarks. [D104735](
* `clang-offload-bundler` can now unbundle archives containing bundled
object files into device-specific archives. [D93525](
## External Compilers
* LLPC can now generate out-of-bounds checks for scratch accesses (stack
* New utilities for iterating over enums using C++ iterators and ranges.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev