[PATCH] D103629: [AArch64] Cost-model i8 vector loads/stores

Fri Jun 4 00:15:36 PDT 2021

SjoerdMeijer added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:1256
-      // We generate 2 instructions per vector element.
-      return NumVectorizableInstsToAmortize * NumVecElts * 2;
-    }
----------------
dmgreen wrote:
> SjoerdMeijer wrote:
> > I was also wondering if this was just a bug, because what we are doing here is `NumVecElts * 2 * NumVecElts * 2`. For an `<4 x i8>` that results in a cost of 64. If this was intention, then I don't think I follow this.
> My rough understanding was that you really don't want the vectorizer to produce
>   <4 x i8> load
>   <4 x i16> zext
> You want to make sure it's at least 8x:
>   <8 x i8> load
>   <8 x i16> zext
> That way you don't serialize the load/extend, using d and q reg instructions as expected.
> 
> So the costs are deliberately high - high enough to prevent the scalarization and cross register bank moves. It may be higher than the cost of the individual instructions, but that is what you want to steer the vectorizer profitably.
There are probably a lot of different cases. When types are all the same width, yes, you want to go for a wider vector.
But in case of mixed types, where e.g. a smaller type is accumulated in a bigger, vectorisation is still profitable (or can be) and we might want to pay the overhead of constructing a vector for the smaller type.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103629/new/

https://reviews.llvm.org/D103629