[PATCH] D103629: [AArch64] Cost-model i8 vector loads/stores

Thu Jul 1 07:53:42 PDT 2021

dmgreen added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:1407

-  if (useNeonVector(Ty) &&
+  bool IsTruncOrExt = VT != LT.second;
+  if (IsTruncOrExt && useNeonVector(Ty) &&
----------------
I think you are adding too many types. It's not a truncating store every time that VT != LT.second. It just means that it is type legalized to a different type.
A v32i8 is just 2 v16i8 loads, so should give LT={2, MVT::v16i8} and should get a cost of 2 (which is LT.first). Any larger type follows the same pattern of splitting until it is a legal type. So v64i8 would be LT={4, MVT::v16i8}, so cost=4.
v16i8 and v8i8 are legal, so get {1, MVT::v16i8} or {1, MVT::v8i8} and cost 1 (which is LT.first again).
For v4i8 the LT.second will be v4i16 and LT.first will be 1 still (I think).

I was expecting it to just be <4 x i8> costs, which are now 2 (or 3 if we want to worry about big endian costs, which we don't usually do AFAIU). One for the f32 load and one for the sshll (or xtn+f32 store). The cost of any other truncation would be measured from the trunc instruction.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103629/new/

https://reviews.llvm.org/D103629