[PATCH] D103629: [AArch64] Cost-model i8 vector loads/stores

Thu Jul 1 11:05:55 PDT 2021

SjoerdMeijer added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:1407

-  if (useNeonVector(Ty) &&
+  bool IsTruncOrExt = VT != LT.second;
+  if (IsTruncOrExt && useNeonVector(Ty) &&
----------------
dmgreen wrote:
> I think you are adding too many types. It's not a truncating store every time that VT != LT.second. It just means that it is type legalized to a different type.
> A v32i8 is just 2 v16i8 loads, so should give LT={2, MVT::v16i8} and should get a cost of 2 (which is LT.first). Any larger type follows the same pattern of splitting until it is a legal type. So v64i8 would be LT={4, MVT::v16i8}, so cost=4.
> v16i8 and v8i8 are legal, so get {1, MVT::v16i8} or {1, MVT::v8i8} and cost 1 (which is LT.first again).
> For v4i8 the LT.second will be v4i16 and LT.first will be 1 still (I think).
> 
> I was expecting it to just be <4 x i8> costs, which are now 2 (or 3 if we want to worry about big endian costs, which we don't usually do AFAIU). One for the f32 load and one for the sshll (or xtn+f32 store). The cost of any other truncation would be measured from the trunc instruction.
Okay, that's fair enough.

I don't like this `ProfitableNumElements` business here, and would like to see that go.
It's wrong for the loads, so will have to adjust that. And when I do that, I don't want to add a sort of special case for the loads and keep the `ProfitableNumElements` for the stores and that logic for the `getNumElements() < ProfitableNumElements` business. So that explains that a little bit.

I think we have 2 cases:
- A type is legal, or a type is legal and is split up. In both cases, like you explained the cost is just `LT.first`.
- A type is truncated or extended. 

For an extend of `<4 x i8>` the cost is 2 if it's extended to i16 (1 load, 1 sshll), and the cost is 3 if it is extended to i32 (1 load, 2 sshll). So the cost is not 2 in all cases. And while we are at I added costs for some smaller and larger vectors.

The check to see if it's a trunk/extend is indeed wrong. I will need to look but I am guessing that checking if `getScalarSizeInBits()` is the same for `VT` or `LT.second` or something along those line will cover that. This allows us to do a lookup for a trunc/extend (case 2), or return `LT.second`(case 1).

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103629/new/

https://reviews.llvm.org/D103629