[PATCH] D141693: [AArch64] turn extended vecreduce bigger than v16i8 into udot/sdot
Zain Jaffal via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Jan 16 07:33:37 PST 2023
zjaffal added a comment.
In D141693#4056504 <https://reviews.llvm.org/D141693#4056504>, @dmgreen wrote:
> Do you have any tests for cases that are a multiple of 8 but not of 16, like `<24 x ...`? And can you make sure we have the `load <4 x i8>` test case?
I will add tests for <24 x ..> and <4 x i8>
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:15226
+ bool IsValidSize = Op0VT.getScalarSizeInBits() == 8;
+ if (Op0VT != MVT::v8i8 && !IsValidElementCount && !IsValidSize)
return SDValue();
----------------
dmgreen wrote:
> zjaffal wrote:
> > dmgreen wrote:
> > > I think this should be something like !(IsValidElementCount && IsValidSize).
> > > It is worth adding a v4i8 test if one doesn't exist already:
> > > ```
> > > define i32 @src(ptr %p, i32 %b) {
> > > entry:
> > > %a64 = load <4 x i8>, ptr %p
> > > %a65 = sext <4 x i8> %a64 to <4 x i32>
> > > %a66 = mul nsw <4 x i32> %a65, %a65
> > > %a67 = tail call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %a66)
> > > %a = add i32 %a67, %b
> > > ret i32 %a
> > > }
> > > ```
> > >I think this should be something like !(IsValidElementCount && IsValidSize).
> > but then we won't cover the case for v8i8 or shall we change is validElementCount to be
> > `Op0VT.getVectorNumElements() % 16 == 0; || Op0VT.getVectorNumElements() % 8 == 0;`
> Sorry - I meant with the `Op0VT != MVT::v8i8` too. The condition as written here will bail out if both !IsValidElementCount and !IsValidSize, but it seems like it should be bailing if one of them is false. So:
> ```
> if (Op0VT != MVT::v8i8 && (!IsValidElementCount || !IsValidSize))
> ```
> It could also do `bool IsValidElementCount = Op0VT == MVT::v8i8 || Op0VT.getVectorNumElements() % 16 == 0;` and then check that `if (!IsValidElementCount || !IsValidSize)`, if you think that is cleaner.
I will add the v4i8 test on a separate patch. We don't support it at the moment
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:15256
+ auto DotOpcode =
+ (ExtOpcode == ISD::ZERO_EXTEND) ? AArch64ISD::UDOT : AArch64ISD::SDOT;
+ SDValue Dot =
----------------
dmgreen wrote:
> DotOpcode can be moved out of the loop, and commoned with the version above. Zeroes can be moved up too.
Shall we add test cases for vectors that are multiples of 8?
for example v24i8
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D141693/new/
https://reviews.llvm.org/D141693
More information about the llvm-commits
mailing list