[llvm] [LLVM][CodeGen][SVE] Improve lowering of fixed length masked mem ops. (PR #134402)

Fri Apr 4 09:12:20 PDT 2025

================
@@ -28697,17 +28703,36 @@ static SDValue convertFixedMaskToScalableVector(SDValue Mask,
   SDLoc DL(Mask);
   EVT InVT = Mask.getValueType();
   EVT ContainerVT = getContainerForFixedLengthVector(DAG, InVT);
-
-  auto Pg = getPredicateForFixedLengthVector(DAG, DL, InVT);
+  SDValue Pg = getPredicateForFixedLengthVector(DAG, DL, InVT);
 
   if (ISD::isBuildVectorAllOnes(Mask.getNode()))
     return Pg;
 
-  auto Op1 = convertToScalableVector(DAG, ContainerVT, Mask);
-  auto Op2 = DAG.getConstant(0, DL, ContainerVT);
+  bool InvertCond = false;
+  if (isBitwiseNot(Mask)) {
+    InvertCond = true;
+    Mask = Mask.getOperand(0);
+  }
+
+  SDValue Op1, Op2;
+  ISD::CondCode CC;
+
+  // When Mask is the result of a SETCC, it's better to regenerate the compare.
+  if (Mask.getOpcode() == ISD::SETCC) {
----------------
rj-jesus wrote:

Nice! Could this be extended to peak through ISD::SIGN_EXTEND too? I'm thinking of cases such as:
```
t14: v16i8 = sign_extend t6
  t6: v16i1 = setcc t2, t4, seteq:ch
```
I've seen this come up when using `@llvm.experimental.cttz.elts` with fixed-length vectors (although presumably it's a general pattern), e.g.:
```llvm
define i64 @cmpeq_i8(<16 x i8> %a, <16 x i8> %b) {
  %cmp = icmp eq <16 x i8> %a, %b
  %ctz = tail call i64 @llvm.experimental.cttz.elts(<16 x i1> %cmp, i1 1)
  ret i64 %ctz
}
```

Otherwise I'll have a look into it once this PR lands. :)

https://github.com/llvm/llvm-project/pull/134402