[llvm] [AArch64] Improve code generation of bool vector reduce operations (PR #115713)
Csanád Hajdú via llvm-commits
llvm-commits at lists.llvm.org
Tue Nov 12 00:43:19 PST 2024
================
@@ -15820,11 +15820,26 @@ static SDValue getVectorBitwiseReduce(unsigned Opcode, SDValue Vec, EVT VT,
return getVectorBitwiseReduce(Opcode, HalfVec, VT, DL, DAG);
}
- // Vectors that are less than 64 bits get widened to neatly fit a 64 bit
- // register, so e.g. <4 x i1> gets lowered to <4 x i16>. Sign extending to
+ // Results of setcc operations get widened to 128 bits for xor reduce if
+ // their input operands are 128 bits wide, otherwise vectors that are less
+ // than 64 bits get widened to neatly fit a 64 bit register, so e.g.
+ // <4 x i1> gets lowered to either <4 x i16> or <4 x i32>. Sign extending to
// this element size leads to the best codegen, since e.g. setcc results
// might need to be truncated otherwise.
- EVT ExtendedVT = MVT::getIntegerVT(std::max(64u / NumElems, 8u));
+ unsigned ExtendedWidth = 64;
+ if (ScalarOpcode == ISD::XOR && Vec.getOpcode() == ISD::SETCC &&
+ Vec.getOperand(0).getValueSizeInBits() >= 128) {
+ ExtendedWidth = 128;
+ }
+ EVT ExtendedVT = MVT::getIntegerVT(std::max(ExtendedWidth / NumElems, 8u));
+
+ // Negate the reduced vector value for reduce and operations that use
+ // fcmp.
+ if (ScalarOpcode == ISD::AND && NumElems < 16) {
+ Vec = DAG.getNode(
+ ISD::XOR, DL, VecVT, Vec,
+ DAG.getSplatVector(VecVT, DL, DAG.getConstant(-1, DL, MVT::i32)));
----------------
Il-Capitano wrote:
> `DAG.getConstant(-1, DL, VT)`.
This is what I tried initially, but that leads to bad code generation in some cases, e.g. in test/CodeGen/reduce-and.ll this is part of the diff compared to the current version:
```patch
@@ -42,7 +42,8 @@ define i1 @test_redand_v2i1(<2 x i1> %a) {
define i1 @test_redand_v4i1(<4 x i1> %a) {
; CHECK-LABEL: test_redand_v4i1:
; CHECK: // %bb.0:
-; CHECK-NEXT: mvn v0.8b, v0.8b
+; CHECK-NEXT: movi v1.4h, #1
+; CHECK-NEXT: eor v0.8b, v0.8b, v1.8b
; CHECK-NEXT: shl v0.4h, v0.4h, #15
; CHECK-NEXT: cmlt v0.4h, v0.4h, #0
; CHECK-NEXT: fcmp d0, #0.0
```
I quickly checked, and the same thing happens when using `DAG.getNOT`.
I looked into this previously, and found two separate changes that fixed codegen when using `DAG.getConstant(-1, DL, VT)`:
1. The legalization logic for `SPLAT_VECTOR` is different from `BUILD_VECTOR` in the way they handle promotion of boolean operands. In `SPLAT_VECTOR` a `Constant::i1<-1>` gets zero-extended to `Constant::i32<1>`, while in `BUILD_VECTOR` it gets sign-extended to `Constant::i32<-1>`. Copying the logic used in `BUILD_VECTOR` legalization over to `SPLAT_VECTOR` legalization fixes the issue.
2. If we pass `AllowTruncation=true` to the `isConstOrConstSplat` call in [CodeGen/SelectionDAG/TargetLowering.cpp:1617](https://github.com/llvm/llvm-project/blob/469ac118418fff2fc07e5705ff527405060ac586/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp#L1617) and handle the constant value appropriately, it also eliminates the issue.
When using `DAG.getNOT`, only the second option would fix it, because it uses `Constant::i32<1>` values in the generated node.
I'm not sure what solution would be preferred. The two changes above are unrelated to this PR, so I could open separate PRs for them, and make this one depend on those, however I couldn't come up with a test case for either that showed their effects.
https://github.com/llvm/llvm-project/pull/115713
More information about the llvm-commits
mailing list