[llvm] [CostModel][X86] Improve cost estimation of insert_subvector shuffle patterns of legalized types (PR #119363)

Tue Dec 10 06:17:44 PST 2024

================
@@ -18,24 +18,24 @@
 
 define void @test_vXf64(<2 x double> %a128, <4 x double> %a256, <8 x double> %a512, <2 x double> %b128, <4 x double> %b256, <8 x double> %b512) {
 ; SSE-LABEL: 'test_vXf64'
-; SSE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V256_128 = shufflevector <2 x double> %a128, <2 x double> %b128, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+; SSE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %V256_128 = shufflevector <2 x double> %a128, <2 x double> %b128, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
 ; SSE-NEXT:  Cost Model: Found an estimated cost of 28 for instruction: %V512_128 = shufflevector <2 x double> %a128, <2 x double> %b128, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
----------------
RKSimon wrote:

Yes eventually processShuffleMasks will be able to handle it - if you look at the shuffle mask it concats both v2f64 inputs TWICE into a v8f64, so for now it gets treated as a general SK_PermuteTwoSrc. When I get processShuffleMasks support updated it will split the v8f64 into smaller legal types and see the sub-shuffles are free on SSE (and cheap on AVX).

https://github.com/llvm/llvm-project/pull/119363