[llvm] [BasicTTI] Use getTypeLegalizationCost to generalize vector cast cost. (PR #107303)

Tue Sep 17 04:41:56 PDT 2024

================
@@ -559,22 +559,22 @@ define i32 @casts() {
 ; CHECK-MVE-RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %r87hf = fpext <4 x half> undef to <4 x float>
 ; CHECK-MVE-RECIP-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %r88hf = fpext <8 x half> undef to <8 x float>
 ; CHECK-MVE-RECIP-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %r89hf = fpext <16 x half> undef to <16 x float>
-; CHECK-MVE-RECIP-NEXT:  Cost Model: Found an estimated cost of 36 for instruction: %r90 = fptoui <2 x float> undef to <2 x i1>
-; CHECK-MVE-RECIP-NEXT:  Cost Model: Found an estimated cost of 36 for instruction: %r91 = fptosi <2 x float> undef to <2 x i1>
-; CHECK-MVE-RECIP-NEXT:  Cost Model: Found an estimated cost of 36 for instruction: %r92 = fptoui <2 x float> undef to <2 x i8>
-; CHECK-MVE-RECIP-NEXT:  Cost Model: Found an estimated cost of 36 for instruction: %r93 = fptosi <2 x float> undef to <2 x i8>
-; CHECK-MVE-RECIP-NEXT:  Cost Model: Found an estimated cost of 36 for instruction: %r94 = fptoui <2 x float> undef to <2 x i16>
-; CHECK-MVE-RECIP-NEXT:  Cost Model: Found an estimated cost of 36 for instruction: %r95 = fptosi <2 x float> undef to <2 x i16>
-; CHECK-MVE-RECIP-NEXT:  Cost Model: Found an estimated cost of 36 for instruction: %r96 = fptoui <2 x float> undef to <2 x i32>
-; CHECK-MVE-RECIP-NEXT:  Cost Model: Found an estimated cost of 36 for instruction: %r97 = fptosi <2 x float> undef to <2 x i32>
+; CHECK-MVE-RECIP-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %r90 = fptoui <2 x float> undef to <2 x i1>
----------------
davemgreen wrote:

I'm not sure these are more correct for MVE from looking at some examples in godbolt. MVE is fairly constrained with what it can do with lane shuffling, so tends to scalarize anything that is `2 x`. The general strategy is to keep the costs high so that the mid-end picks a higher vector factor or doesn't vectorize at-all.

It looks like it is assuming that it will "widen", as opposed to "promote". Probably because it looks at the fp type, not the integer dst. I believe the defaults are that integer type promote, fp types widen (and X86 does more widening nowadays). And non-power-2 types widen upto to the next power-2.

https://github.com/llvm/llvm-project/pull/107303