[PATCH] D88577: [AArch64] Generate dot for v16i8 sum reduction to i32

Thu Oct 1 09:55:28 PDT 2020

dmgreen added a comment.

> EDIT: SLP vectorizer generates the pattern in this case.

Ah I see.

https://llvm.org/docs/Phabricator.html#phabricator-request-review-web has some details on creating diffs with extra context.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:10970
+  MachineSDNode *ABD =
+      DAG.getMachineNode(AArch64::UDOTv16i8, SDLoc(Op0), Zeros.getValueType(),
+                         Zeros, Ones, Op0.getOperand(0));
----------------
mivnay wrote:
> dmgreen wrote:
> > We ideally shouldn't be just producing a machine node here. Can you add a AArch64ISD::UDOT node?
> > 
> > We should be doing the same for SDOT too.
> I have added the support for SDOT.
> 
> >  Can you add a AArch64ISD::UDOT node?
> 
> Can this be done in a later patch?
It's best not create machine nodes directly in the DAG...

Maybe create a int_aarch64_neon_sdot intrinsic node instead? That will avoid the need for a new node, and go through the existing tablegen patterns.

It would be nice to be able to fold add(x, UDOT(0, y)) -> UDOT(x, y), which is where having a node will really become useful. That needn't be done here though.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:10949-10951
+  if (!ST->hasDotProd() || N->getValueType(0) != MVT::i32) {
+    return SDValue();
+  }
----------------
Any single statement blocks do not need { } brackets.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:10964-10965
+
+  if (Op0VT.isScalableVector() || Op0VT.getVectorElementType() != MVT::i8 ||
+      Op0VT.getVectorNumElements() != 16) {
+    return SDValue();
----------------
This second isScalable check isn't needed, and maybe move the num elements == 16 check up to the VT check, by checking for v16i32 directly?

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:10972
+  SDValue Zeros = DAG.getConstant(
+      0, SDLoc(Op0), EVT::getVectorVT(*DAG.getContext(), MVT::i32, 4));
+
----------------
EVT::getVectorVT(*DAG.getContext(), MVT::i32, 4) -> MVT::v4i32

I wasn't aware that you could use getConstant directly on vector types like this. Neat.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:10981
+                                 N->getValueType(0), SDValue(ABD, 0));
+  DAG.ReplaceAllUsesOfValueWith(SDValue(N, 0), FinalABD);
+  return FinalABD;
----------------
Why is this is calling ReplaceAllUsesOfValueWith? Is returning the value not enough?

================
Comment at: llvm/test/CodeGen/AArch64/neon-dot-product.ll:261
+; CHECK-LABEL: test_udot_v16i8_2:
+; CHECK:  udot {{v[0-9]+}}.4s, {{v[0-9]+}}.16b, {{v[0-9]+}}.16b
+  %0 = bitcast i8* %a1 to <16 x i8>*
----------------
Can you update the tests to show all the instructions? The 1's and 0's and addv's are important here too.

I would just use the update_llc_test_checks script.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88577/new/

https://reviews.llvm.org/D88577