[PATCH] D100124: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX redux.sync instructions
Steffen Larsen via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Apr 13 02:29:25 PDT 2021
steffenlarsen added a comment.
In D100124#2684303 <https://reviews.llvm.org/D100124#2684303>, @JonChesterfield wrote:
> Interesting. Reduction across lanes in warp? If so, this is probably a way to handle the last step reduction for openmp reductions
It is! I can imagine that it would be useful for OpenMP reductions, though it is limited to few, albeit common, operators on 32-bit integers.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D100124/new/
https://reviews.llvm.org/D100124
More information about the llvm-commits
mailing list