[PATCH] D104042: [AArch64] Improve SAD pattern

Fri Jun 11 04:24:27 PDT 2021

jaykang10 added a comment.

In D104042#2811212 <https://reviews.llvm.org/D104042#2811212>, @dmgreen wrote:

> Nice patch. It's a big combine, but we certainly have bigger elsewhere. Sometimes it's possible to do things like this as smaller pieces, that add up to the same thing.

Thanks for your kind comments @dmgreen
I agree with you. I tried to do this combine with smaller pieces but the `v16i32` type caused more nodes. Type legalizer split the `v16i32` type operations into multiple `v4i32` and `v4i16` operations because the AArch64 target does not support the native register class for the `v16i32` type. In order to avoid the type legalization, I combined `vecreduce_add`.

> Speaking of which, would it be possible to combine (in a separate patch, maybe as a tablegen pattern) addv (uaddlp x) -> uaddlv x ?

It is good point! I have tried below pattern following your suggestion. It seems to work. If you are ok, let me add below pattern in this patch.

  let AddedComplexity = 10 in { 
  def : Pat<(i32 (extractelt
                   (v4i32 (AArch64uaddv (v4i32 (AArch64uaddlp (v8i16 V128:$op))))),
                   (i64 0))),
            (UADDLVv8i16v V128:$op)>;
  }

> Would it make sense to add signed handling too? If I'm not mistaken, if it has sext's as opposed to zext's, it can turn into addv(uaddlp(sabd,sabd2). The sadb's produce an unsigned value from signed inputs, so the remaining extensions are zext's not sexts and it doesn't need a SADDLP node. That may not be correct though, so, um, make sure you check it.

Yep, I agree with you. Let me update code to support  signed one.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:12004
+//         v8i8 UABD high8:v16i8 a, high8:v16i8 b
+static SDValue performVecReduceAddCombineWithUADALP(SDNode *N,
+                                                    SelectionDAG &DAG) {
----------------
dmgreen wrote:
> This doesn't appear to use UADALP. Is that meant as a shorthand for UADA + UADDLP?
Sorry for typo. Let me update the name.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:12011
+  SDValue VecReduceOp0 = N->getOperand(0);
+  unsigned Opcode = VecReduceOp0.getOpcode();
+  // Assumed v16i32 abs
----------------
dmgreen wrote:
> This is mostly a style comment, so feel free to ignore if you like. They Opcode's are only used once in the checks, so could be done inline without the variable. Having the "ABSOp0" and "SUB" nodes could just be one variable called "SUB" too, as we are checking the opcode on them straight away.
Yep, let me update it.

================
Comment at: llvm/test/CodeGen/AArch64/neon-sad.ll:7
+
+define i32 @test_sad_v16i8(i8* nocapture readonly %a, i8* nocapture readonly %b) {
+; CHECK-LABEL: test_sad_v16i8:
----------------
dmgreen wrote:
> It can be good to pre-commit the tests, to just show the differences in the review. It makes it easier to see what the patch does.
Yep, let me pre-commit this test.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104042/new/

https://reviews.llvm.org/D104042