[all-commits] [llvm/llvm-project] e0c4a3: [Clang] Fix cross-lane scan when given divergent l...
Joseph Huber via All-commits
all-commits at lists.llvm.org
Fri Feb 21 14:10:55 PST 2025
Branch: refs/heads/release/20.x
Home: https://github.com/llvm/llvm-project
Commit: e0c4a3397fd2f80740d776de85360dc12cd0bcc7
https://github.com/llvm/llvm-project/commit/e0c4a3397fd2f80740d776de85360dc12cd0bcc7
Author: Joseph Huber <huberjn at outlook.com>
Date: 2025-02-21 (Fri, 21 Feb 2025)
Changed paths:
M clang/lib/Headers/gpuintrin.h
M clang/lib/Headers/nvptxintrin.h
M libc/test/integration/src/__support/GPU/scan_reduce.cpp
Log Message:
-----------
[Clang] Fix cross-lane scan when given divergent lanes (#127703)
Summary:
The scan operation implemented here only works if there are contiguous
ones in the executation mask that can be used to propagate the result.
There are two solutions to this, one is to enter 'whole-wave-mode' and
forcibly turn them back on, or to do this serially. This implementation
does the latter because it's more portable, but checks to see if the
parallel fast-path is applicable.
Needs to be backported for correct behavior and because it fixes a
failing libc test.
(cherry picked from commit 6cc7ca084a5bbb7ccf606cab12065604453dde59)
To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications
More information about the All-commits
mailing list