[PATCH] D141693: [AArch64] turn extended vecreduce bigger than v16i8 into udot/sdot

Mon Jan 16 07:33:37 PST 2023

zjaffal added a comment.

In D141693#4056504 <https://reviews.llvm.org/D141693#4056504>, @dmgreen wrote:

> Do you have any tests for cases that are a multiple of 8 but not of 16, like `<24 x ...`? And can you make sure we have the `load <4 x i8>` test case?

I will add tests for <24 x ..> and <4 x i8>

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:15226
+  bool IsValidSize = Op0VT.getScalarSizeInBits() == 8;
+  if (Op0VT != MVT::v8i8 && !IsValidElementCount && !IsValidSize)
     return SDValue();
----------------
dmgreen wrote:
> zjaffal wrote:
> > dmgreen wrote:
> > > I think this should be something like !(IsValidElementCount && IsValidSize).
> > > It is worth adding a v4i8 test if one doesn't exist already:
> > > ```
> > > define i32 @src(ptr %p, i32 %b) {
> > > entry:
> > >   %a64 = load <4 x i8>, ptr %p
> > >   %a65 = sext <4 x i8> %a64 to <4 x i32>
> > >   %a66 = mul nsw <4 x i32> %a65, %a65
> > >   %a67 = tail call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %a66)
> > >   %a = add i32 %a67, %b
> > >   ret i32 %a
> > > }
> > > ```
> > >I think this should be something like !(IsValidElementCount && IsValidSize).
> > but then we won't cover the case for v8i8 or shall we change is validElementCount to be
> > `Op0VT.getVectorNumElements() % 16 == 0; || Op0VT.getVectorNumElements() % 8 == 0;`
> Sorry - I meant with the `Op0VT != MVT::v8i8` too. The condition as written here will bail out if both !IsValidElementCount and !IsValidSize, but it seems like it should be bailing if one of them is false. So:
> ```
> if (Op0VT != MVT::v8i8 && (!IsValidElementCount || !IsValidSize))
> ```
> It could also do `bool IsValidElementCount = Op0VT == MVT::v8i8 || Op0VT.getVectorNumElements() % 16 == 0;` and then check that `if (!IsValidElementCount || !IsValidSize)`, if you think that is cleaner.
I will add the v4i8 test on a separate patch. We don't support it at the moment

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:15256
+    auto DotOpcode =
+        (ExtOpcode == ISD::ZERO_EXTEND) ? AArch64ISD::UDOT : AArch64ISD::SDOT;
+    SDValue Dot =
----------------
dmgreen wrote:
> DotOpcode can be moved out of the loop, and commoned with the version above. Zeroes can be moved up too.
Shall we add test cases for vectors that are multiples of 8? 
for example v24i8

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141693/new/

https://reviews.llvm.org/D141693