[PATCH] D100124: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX redux.sync instructions

Tue Apr 13 02:29:25 PDT 2021

steffenlarsen added a comment.

In D100124#2684303 <https://reviews.llvm.org/D100124#2684303>, @JonChesterfield wrote:

> Interesting. Reduction across lanes in warp? If so, this is probably a way to handle the last step reduction for openmp reductions

It is! I can imagine that it would be useful for OpenMP reductions, though it is limited to few, albeit common, operators on 32-bit integers.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100124/new/

https://reviews.llvm.org/D100124