[llvm] [ScalarizeMaskedMemIntr] Don't use a scalar mask on GPUs (PR #104842)

Mon Aug 19 13:04:26 PDT 2024

================
@@ -500,9 +507,10 @@ static void scalarizeMaskedGather(const DataLayout &DL, CallInst *CI,
   }
 
   // If the mask is not v1i1, use scalar bit test operations. This generates
-  // better results on X86 at least.
-  Value *SclrMask;
-  if (VectorWidth != 1) {
+  // better results on X86 at least. However, don't do this on GPUs or other
+  // machines with branch divergence, as there, each i1 takes up a register.
+  Value *SclrMask = nullptr;
+  if (!TTI.hasBranchDivergence() && VectorWidth != 1) {
----------------
arsenm wrote:

Ditto 

https://github.com/llvm/llvm-project/pull/104842