[PATCH] D141693: [AArch64] turn extended vecreduce bigger than v16i8 into udot/sdot

Mon Jan 16 00:33:55 PST 2023

dmgreen added a comment.

Sounds like a nice improvement

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:15226
+  bool IsValidSize = Op0VT.getScalarSizeInBits() == 8;
+  if (Op0VT != MVT::v8i8 && !IsValidElementCount && !IsValidSize)
     return SDValue();
----------------
I think this should be something like !(IsValidElementCount && IsValidSize).
It is worth adding a v4i8 test if one doesn't exist already:
```
define i32 @src(ptr %p, i32 %b) {
entry:
  %a64 = load <4 x i8>, ptr %p
  %a65 = sext <4 x i8> %a64 to <4 x i32>
  %a66 = mul nsw <4 x i32> %a65, %a65
  %a67 = tail call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %a66)
  %a = add i32 %a67, %b
  ret i32 %a
}
```

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:15256
+    auto DotOpcode =
+        (ExtOpcode == ISD::ZERO_EXTEND) ? AArch64ISD::UDOT : AArch64ISD::SDOT;
+    SDValue Dot =
----------------
DotOpcode can be moved out of the loop, and commoned with the version above. Zeroes can be moved up too.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141693/new/

https://reviews.llvm.org/D141693