[clang] [Clang] Change ballot mask handling for GPU intrinsics (PR #176202)
Joseph Huber via cfe-commits
cfe-commits at lists.llvm.org
Tue Feb 10 08:31:22 PST 2026
jhuber6 wrote:
> I did some experiment with RTX2060 wihch supports ITS. It seems `__ballot_sync` does mask the result by the lame_mask argument so the out-of-mask lanes won't affect result even if they reach the function and vote. My concern is that if HIP does not mask the result, users may get different behavior and get confused.
That's interesting, I have an `sm_89` locally and I did not observe this behavior. In fact I found if you intentionally restricted it you'd get warp illegal instructions. I don't know enough details about how this actually works unfortunately, but I didn't see any forced sub-masking.
https://github.com/llvm/llvm-project/pull/176202
More information about the cfe-commits
mailing list