[PATCH] D100099: [X86][CostModel] Try to fix cost computation load/stores of non-power-of-two vectors

Thu Apr 15 10:41:38 PDT 2021

lebedev.ri marked an inline comment as done.
lebedev.ri added a comment.

In D100099#2692024 <https://reviews.llvm.org/D100099#2692024>, @RKSimon wrote:

> In D100099#2691945 <https://reviews.llvm.org/D100099#2691945>, @lebedev.ri wrote:
>
>> This is probably still impresice for small remainder sub-vectors.
>> E.g. load cost for `<3 x float>` w/ 8 byte alignment should be 1: https://godbolt.org/z/r3ncvMvaf
>
> That assumes 16-byte alignment though - for v3f32 the more likely (element) 4-byte alignment will be worse - not sure whether we have much difference alignment coverage (or whether its worth it).

Sorry, i meant 16, yes.

> The costs are a lot better than what they were, but there are a few cases that seem off - if I had to guess I'd say we're not completely getting the split types matching the legalized types?

================
Comment at: llvm/test/Analysis/CostModel/X86/load_store.ll:26
 ; SSE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: store <3 x i64> undef, <3 x i64>* undef, align 4
-; SSE-NEXT:  Cost Model: Found an estimated cost of 13 for instruction: store <5 x i32> undef, <5 x i32>* undef, align 4
-; SSE-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: store <5 x i64> undef, <5 x i64>* undef, align 4
-; SSE-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: store <5 x i16> undef, <5 x i16>* undef, align 4
-; SSE-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: store <6 x i16> undef, <6 x i16>* undef, align 4
-; SSE-NEXT:  Cost Model: Found an estimated cost of 14 for instruction: store <7 x i16> undef, <7 x i16>* undef, align 4
-; SSE-NEXT:  Cost Model: Found an estimated cost of 22 for instruction: store <11 x i16> undef, <11 x i16>* undef, align 4
-; SSE-NEXT:  Cost Model: Found an estimated cost of 24 for instruction: store <12 x i16> undef, <12 x i16>* undef, align 4
-; SSE-NEXT:  Cost Model: Found an estimated cost of 26 for instruction: store <13 x i16> undef, <13 x i16>* undef, align 4
-; SSE-NEXT:  Cost Model: Found an estimated cost of 46 for instruction: store <23 x i16> undef, <23 x i16>* undef, align 4
-; SSE-NEXT:  Cost Model: Found an estimated cost of 48 for instruction: store <24 x i16> undef, <24 x i16>* undef, align 4
-; SSE-NEXT:  Cost Model: Found an estimated cost of 50 for instruction: store <25 x i16> undef, <25 x i16>* undef, align 4
-; SSE-NEXT:  Cost Model: Found an estimated cost of 94 for instruction: store <47 x i16> undef, <47 x i16>* undef, align 4
-; SSE-NEXT:  Cost Model: Found an estimated cost of 96 for instruction: store <48 x i16> undef, <48 x i16>* undef, align 4
-; SSE-NEXT:  Cost Model: Found an estimated cost of 98 for instruction: store <49 x i16> undef, <49 x i16>* undef, align 4
+; SSE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: store <5 x i32> undef, <5 x i32>* undef, align 4
+; SSE-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: store <5 x i64> undef, <5 x i64>* undef, align 4
----------------
RKSimon wrote:
> cost = 2 ? SSE max size is 128-bit vector - so no subvector extraction cost - <3 x double> seems to get it right?
This happens because for vectors we assume that extracting 0'th element of a FP vector is free:
https://github.com/llvm/llvm-project/blob/4f42d873c20291077f5a1ed37b102330d505f00d/llvm/lib/Target/X86/X86TargetTransformInfo.cpp#L3055-L3065

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100099/new/

https://reviews.llvm.org/D100099