[llvm] [AArch64] Improve code generation of bool vector reduce operations (PR #115713)

Mon Nov 11 23:50:54 PST 2024

================
@@ -15820,11 +15820,26 @@ static SDValue getVectorBitwiseReduce(unsigned Opcode, SDValue Vec, EVT VT,
       return getVectorBitwiseReduce(Opcode, HalfVec, VT, DL, DAG);
     }
 
-    // Vectors that are less than 64 bits get widened to neatly fit a 64 bit
-    // register, so e.g. <4 x i1> gets lowered to <4 x i16>. Sign extending to
+    // Results of setcc operations get widened to 128 bits for xor reduce if
+    // their input operands are 128 bits wide, otherwise vectors that are less
+    // than 64 bits get widened to neatly fit a 64 bit register, so e.g.
+    // <4 x i1> gets lowered to either <4 x i16> or <4 x i32>. Sign extending to
     // this element size leads to the best codegen, since e.g. setcc results
     // might need to be truncated otherwise.
-    EVT ExtendedVT = MVT::getIntegerVT(std::max(64u / NumElems, 8u));
+    unsigned ExtendedWidth = 64;
+    if (ScalarOpcode == ISD::XOR && Vec.getOpcode() == ISD::SETCC &&
+        Vec.getOperand(0).getValueSizeInBits() >= 128) {
+      ExtendedWidth = 128;
+    }
+    EVT ExtendedVT = MVT::getIntegerVT(std::max(ExtendedWidth / NumElems, 8u));
+
+    // Negate the reduced vector value for reduce and operations that use
+    // fcmp.
+    if (ScalarOpcode == ISD::AND && NumElems < 16) {
+      Vec = DAG.getNode(
+          ISD::XOR, DL, VecVT, Vec,
+          DAG.getSplatVector(VecVT, DL, DAG.getConstant(-1, DL, MVT::i32)));
----------------
davemgreen wrote:

`DAG.getConstant(-1, DL, VT)`. 
Although maybe `DAG.getNOT` instead.

https://github.com/llvm/llvm-project/pull/115713