[llvm] [ScalarizeMaskedMemIntr] Don't use a scalar mask on GPUs (PR #104842)
Matt Arsenault via llvm-commits
llvm-commits at lists.llvm.org
Mon Aug 19 13:04:26 PDT 2024
================
@@ -500,9 +507,10 @@ static void scalarizeMaskedGather(const DataLayout &DL, CallInst *CI,
}
// If the mask is not v1i1, use scalar bit test operations. This generates
- // better results on X86 at least.
- Value *SclrMask;
- if (VectorWidth != 1) {
+ // better results on X86 at least. However, don't do this on GPUs or other
+ // machines with branch divergence, as there, each i1 takes up a register.
+ Value *SclrMask = nullptr;
+ if (!TTI.hasBranchDivergence() && VectorWidth != 1) {
----------------
arsenm wrote:
Ditto
https://github.com/llvm/llvm-project/pull/104842
More information about the llvm-commits
mailing list