[PATCH] D148185: Add more efficient bitwise vector reductions on AArch64

Tue Apr 18 03:17:03 PDT 2023

dmgreen added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:13252
+    SDValue Extended =
+        DAG.getAnyExtOrTrunc(Vec, DL, VecVT.changeVectorElementType(MVT::i8));
+    switch (ScalarOpcode) {
----------------
Sp00ph wrote:
> efriedma wrote:
> > Sp00ph wrote:
> > > Using either zext or sext here adds a few extra instructions in the generated code. Is it guaranteed that any-extending an i1 vector results in a vector whose elements are all either 0 or -1? It seems reasonable because afaik mask vector elements on AArch64 are always either 0 or -1, but it could also introduce some subtle incorrectness if there is some case where any-extending an i1 vector does not result in such a mask vector.
> > No, no guarantee here.  I mean, there are restrictions related to boolean operands certain specific operations (like the condition of a VSELECT), but there isn't any restriction that applies to arithmetic operations.  An easy way to get a vector with arbitrary data in the high bits is truncating from nxi8 to nxi1.
> > 
> > You could generate a different sequence if the operand is known to be sign-extended (ComputeNumSignBits).
> `ComputeNumSignBits` doesn't seem to work properly on `<N x i1>` function arguments. So e.g. an `<8 x i1>` gets lowered to an `<8 x i8>` during function argument lowering, and calling `ComputeNumSignBits` on that returns a 1 (even though `<N x i1>` in function arguments seems to always be all zeros or all ones; either that or the current codegen is already incorrect). If I instead sign extend the vector in the `i1` branch it adds 2 redundant instructions to all the codegen tests that take a `<N x i1>` as a function argument. Tests that e.g. reduce a `<N x i1>` obtained from a setcc don't get those extra instructions because there's a `setcc + sext` combine I believe. I guess this could be fixed by somehow convincing `ComputeNumSignBits` that a `<N x i1>` function argument that got lowered to a `<N x iM>` does in fact have M sign bits?
I believe there is no requirement that arguments are all-ones. For example https://godbolt.org/z/MYdEh1fET. There is a signext attribute that can be applied to scalars, but not vectors.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D148185/new/

https://reviews.llvm.org/D148185