[llvm] [ScalarizeMaskedMemIntr] Don't use a scalar mask on GPUs (PR #104842)

Matt Arsenault via llvm-commits llvm-commits at lists.llvm.org
Mon Aug 19 13:04:26 PDT 2024


================
@@ -378,10 +381,10 @@ static void scalarizeMaskedStore(const DataLayout &DL, CallInst *CI,
   }
 
   // If the mask is not v1i1, use scalar bit test operations. This generates
-  // better results on X86 at least.
-
-  Value *SclrMask;
-  if (VectorWidth != 1) {
+  // better results on X86 at least. However, don't do this on GPUs or other
+  // machines with branch divergence, as there each i1 takes up a register.
+  Value *SclrMask = nullptr;
+  if (!TTI.hasBranchDivergence() && VectorWidth != 1) {
----------------
arsenm wrote:

Ditto 

https://github.com/llvm/llvm-project/pull/104842


More information about the llvm-commits mailing list