[PATCH] D100099: [X86][CostModel] Try to fix cost computation load/stores of non-power-of-two vectors

Fri Apr 16 02:47:17 PDT 2021

RKSimon added a comment.

I've added some 64-byte aligned tests which we can probably add special case load costs - please can you rebase?

For the remaining edge cases - would you prefer to fix them here or iterate in follow up patches? I'd be happy to accept this as is (after rebase) - its a massive improvement and makes it much easier to tweak as we go on.

================
Comment at: llvm/test/Analysis/CostModel/X86/load_store.ll:26
 ; SSE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: store <3 x i64> undef, <3 x i64>* undef, align 4
-; SSE-NEXT:  Cost Model: Found an estimated cost of 13 for instruction: store <5 x i32> undef, <5 x i32>* undef, align 4
-; SSE-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: store <5 x i64> undef, <5 x i64>* undef, align 4
-; SSE-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: store <5 x i16> undef, <5 x i16>* undef, align 4
-; SSE-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: store <6 x i16> undef, <6 x i16>* undef, align 4
-; SSE-NEXT:  Cost Model: Found an estimated cost of 14 for instruction: store <7 x i16> undef, <7 x i16>* undef, align 4
-; SSE-NEXT:  Cost Model: Found an estimated cost of 22 for instruction: store <11 x i16> undef, <11 x i16>* undef, align 4
-; SSE-NEXT:  Cost Model: Found an estimated cost of 24 for instruction: store <12 x i16> undef, <12 x i16>* undef, align 4
-; SSE-NEXT:  Cost Model: Found an estimated cost of 26 for instruction: store <13 x i16> undef, <13 x i16>* undef, align 4
-; SSE-NEXT:  Cost Model: Found an estimated cost of 46 for instruction: store <23 x i16> undef, <23 x i16>* undef, align 4
-; SSE-NEXT:  Cost Model: Found an estimated cost of 48 for instruction: store <24 x i16> undef, <24 x i16>* undef, align 4
-; SSE-NEXT:  Cost Model: Found an estimated cost of 50 for instruction: store <25 x i16> undef, <25 x i16>* undef, align 4
-; SSE-NEXT:  Cost Model: Found an estimated cost of 94 for instruction: store <47 x i16> undef, <47 x i16>* undef, align 4
-; SSE-NEXT:  Cost Model: Found an estimated cost of 96 for instruction: store <48 x i16> undef, <48 x i16>* undef, align 4
-; SSE-NEXT:  Cost Model: Found an estimated cost of 98 for instruction: store <49 x i16> undef, <49 x i16>* undef, align 4
+; SSE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: store <5 x i32> undef, <5 x i32>* undef, align 4
+; SSE-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: store <5 x i64> undef, <5 x i64>* undef, align 4
----------------
lebedev.ri wrote:
> RKSimon wrote:
> > cost = 2 ? SSE max size is 128-bit vector - so no subvector extraction cost - <3 x double> seems to get it right?
> This happens because for vectors we assume that extracting 0'th element of a FP vector is free:
> https://github.com/llvm/llvm-project/blob/4f42d873c20291077f5a1ed37b102330d505f00d/llvm/lib/Target/X86/X86TargetTransformInfo.cpp#L3055-L3065
> 
Ah - of course - we're hitting the issue that the extract + store costs are treated separately, but for f32/f64 extract_0 is free.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100099/new/

https://reviews.llvm.org/D100099