[PATCH] D148185: Add more efficient bitwise vector reductions on AArch64

Tue Apr 18 04:46:57 PDT 2023

Sp00ph added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:13252
+    SDValue Extended =
+        DAG.getAnyExtOrTrunc(Vec, DL, VecVT.changeVectorElementType(MVT::i8));
+    switch (ScalarOpcode) {
----------------
dmgreen wrote:
> Sp00ph wrote:
> > efriedma wrote:
> > > Sp00ph wrote:
> > > > Using either zext or sext here adds a few extra instructions in the generated code. Is it guaranteed that any-extending an i1 vector results in a vector whose elements are all either 0 or -1? It seems reasonable because afaik mask vector elements on AArch64 are always either 0 or -1, but it could also introduce some subtle incorrectness if there is some case where any-extending an i1 vector does not result in such a mask vector.
> > > No, no guarantee here.  I mean, there are restrictions related to boolean operands certain specific operations (like the condition of a VSELECT), but there isn't any restriction that applies to arithmetic operations.  An easy way to get a vector with arbitrary data in the high bits is truncating from nxi8 to nxi1.
> > > 
> > > You could generate a different sequence if the operand is known to be sign-extended (ComputeNumSignBits).
> > `ComputeNumSignBits` doesn't seem to work properly on `<N x i1>` function arguments. So e.g. an `<8 x i1>` gets lowered to an `<8 x i8>` during function argument lowering, and calling `ComputeNumSignBits` on that returns a 1 (even though `<N x i1>` in function arguments seems to always be all zeros or all ones; either that or the current codegen is already incorrect). If I instead sign extend the vector in the `i1` branch it adds 2 redundant instructions to all the codegen tests that take a `<N x i1>` as a function argument. Tests that e.g. reduce a `<N x i1>` obtained from a setcc don't get those extra instructions because there's a `setcc + sext` combine I believe. I guess this could be fixed by somehow convincing `ComputeNumSignBits` that a `<N x i1>` function argument that got lowered to a `<N x iM>` does in fact have M sign bits?
> I believe there is no requirement that arguments are all-ones. For example https://godbolt.org/z/MYdEh1fET. There is a signext attribute that can be applied to scalars, but not vectors.
In that case it looks like the codegen for the boolean vector reductions is already wrong without this patch. For example this: https://llvm.godbolt.org/z/YjE8n7q7s causes calls to `bad` to return a 0, when I believe they should return a 1 instead, because it does umax then truncate instead of truncate then umax, and those don't commute in the general case.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D148185/new/

https://reviews.llvm.org/D148185