[llvm] [SLP]Fix graph traversal in getSpillCost (PR #124984)
Philip Reames via llvm-commits
llvm-commits at lists.llvm.org
Thu Jan 30 10:18:27 PST 2025
================
@@ -149,37 +149,27 @@ define <4 x float> @exp_4x(ptr %a) {
; CHECK-SAME: (ptr [[A:%.*]]) #[[ATTR1]] {
; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr [[A]], align 16
-; CHECK-NEXT: [[VECEXT:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
-; CHECK-NEXT: [[TMP1:%.*]] = tail call fast float @expf(float [[VECEXT]])
-; CHECK-NEXT: [[VECINS:%.*]] = insertelement <4 x float> undef, float [[TMP1]], i32 0
-; CHECK-NEXT: [[VECEXT_1:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
-; CHECK-NEXT: [[TMP2:%.*]] = tail call fast float @expf(float [[VECEXT_1]])
-; CHECK-NEXT: [[VECINS_1:%.*]] = insertelement <4 x float> [[VECINS]], float [[TMP2]], i32 1
-; CHECK-NEXT: [[VECEXT_2:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
-; CHECK-NEXT: [[TMP3:%.*]] = tail call fast float @expf(float [[VECEXT_2]])
-; CHECK-NEXT: [[VECINS_2:%.*]] = insertelement <4 x float> [[VECINS_1]], float [[TMP3]], i32 2
-; CHECK-NEXT: [[VECEXT_3:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
-; CHECK-NEXT: [[TMP4:%.*]] = tail call fast float @expf(float [[VECEXT_3]])
-; CHECK-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP4]], i32 3
-; CHECK-NEXT: ret <4 x float> [[VECINS_3]]
+; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x float> [[TMP0]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
----------------
preames wrote:
I think I got confused here by the fact the extracts were interwoven with the scalar calls in the input. Generally, I see that occurring in the output of the vectorizer, and it results in an unprofitable overall result. But in this case, that's also the input.
If I reorganize this test to have all the extracts, all the calls, then all the inserts, I get the result I was expecting - no change from input - both before and after this change. This means the delta in this patch is specific to particular IR order here.
I was initially thinking this had to do with the scalar vs vector typing issue in NoCallIntrinsic, but unfortunately, a quick and dirty patch shows that while that does benefit a few other cases (only in combination with this patch), it doesn't impact this example at all.
I did some digging into the cost for this routine, and noticed something interesting. A VF=4, the spill cost is computed as 8. But at VF=2, the spill cost is 0. I don't understand why that is true. Might be something worth digging into here?
One side observation worth noting - as this example shows, sometimes vectorization can *remove* spill cost. It might be worth enhancing this logic to account for that at some point.
https://github.com/llvm/llvm-project/pull/124984
More information about the llvm-commits
mailing list