[PATCH] D141693: [AArch64] turn extended vecreduce bigger than v16i8 into udot/sdot

Fri Jan 27 04:52:32 PST 2023

dmgreen added a comment.

Thanks. The results are looking better now, if we can clean up the code a little then this looks good to me.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:15257
+  // Generate Dot instructions that are multiple of 16.
+  unsigned VecReduce16Num = floor(Op0VT.getVectorNumElements() / 16);
+  SmallVector<SDValue, 4> SDotVec16;
----------------
I don't think this needs floor

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:15274
+  // Generate the remainder of the Dot operations that are multiple of 8.
+  for (unsigned I = 0; I < VecReduce8Num; I += 1) {
+    SDValue Zeros = DAG.getConstant(0, DL, MVT::v2i32);
----------------
This can only ever be 0 or 1, so probably doesn't need the loop. Hopefully this can simplify things a little, as we won't need to concat v8 vectors.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:15297
+      SDotVec8);
+  // Append Undef vector to v2i32 Dot vectors in order to concatenate them with
+  // v4i32 vectors.
----------------
They would need to be 0's I think. Would it be better and simpler to just return `vecreduce.add(v16s) + vecreduce.add(v8)`?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141693/new/

https://reviews.llvm.org/D141693