[llvm] [SelectionDAG] Deal with POISON for INSERT_VECTOR_ELT/INSERT_SUBVECTOR (PR #143102)
Björn Pettersson via llvm-commits
llvm-commits at lists.llvm.org
Wed Sep 17 10:15:34 PDT 2025
================
@@ -3572,45 +3572,53 @@ define void @SpinningCube() {
; SSE2-LABEL: SpinningCube:
; SSE2: # %bb.0: # %entry
; SSE2-NEXT: movl $1065353216, (%rax) # imm = 0x3F800000
-; SSE2-NEXT: movaps {{.*#+}} xmm0 = [u,u,u,1.0E+0]
-; SSE2-NEXT: movss {{.*#+}} xmm1 = [NaN,0.0E+0,0.0E+0,0.0E+0]
-; SSE2-NEXT: movapd {{.*#+}} xmm2 = [u,u,-2.0E+0,u]
-; SSE2-NEXT: movsd {{.*#+}} xmm2 = xmm1[0],xmm2[1]
-; SSE2-NEXT: xorps %xmm3, %xmm3
-; SSE2-NEXT: shufps {{.*#+}} xmm3 = xmm3[0,1],xmm2[2,0]
-; SSE2-NEXT: shufps {{.*#+}} xmm1 = xmm1[0,0],xmm0[2,3]
-; SSE2-NEXT: addps %xmm3, %xmm1
-; SSE2-NEXT: movaps %xmm1, (%rax)
-; SSE2-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero
-; SSE2-NEXT: shufps {{.*#+}} xmm1 = xmm1[0,0,0,0]
-; SSE2-NEXT: mulps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
-; SSE2-NEXT: addps %xmm0, %xmm1
-; SSE2-NEXT: movaps %xmm1, (%rax)
+; SSE2-NEXT: xorps %xmm0, %xmm0
+; SSE2-NEXT: movss {{.*#+}} xmm1 = [1.0E+0,0.0E+0,0.0E+0,0.0E+0]
+; SSE2-NEXT: shufps {{.*#+}} xmm1 = xmm1[0,0],xmm0[0,1]
+; SSE2-NEXT: xorps %xmm2, %xmm2
+; SSE2-NEXT: shufps {{.*#+}} xmm2 = xmm2[0,1],xmm1[2,0]
+; SSE2-NEXT: movss {{.*#+}} xmm3 = [NaN,0.0E+0,0.0E+0,0.0E+0]
+; SSE2-NEXT: movapd {{.*#+}} xmm4 = [u,u,-2.0E+0,u]
+; SSE2-NEXT: movsd {{.*#+}} xmm4 = xmm3[0],xmm4[1]
+; SSE2-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,1],xmm4[2,0]
+; SSE2-NEXT: movq {{.*#+}} xmm3 = xmm3[0],zero
+; SSE2-NEXT: shufps {{.*#+}} xmm3 = xmm3[2,0],xmm1[2,0]
+; SSE2-NEXT: addps %xmm0, %xmm3
+; SSE2-NEXT: movaps %xmm3, (%rax)
+; SSE2-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
+; SSE2-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,0,0,0]
+; SSE2-NEXT: mulps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
+; SSE2-NEXT: addps %xmm2, %xmm0
+; SSE2-NEXT: movaps %xmm0, (%rax)
----------------
bjope wrote:
To clarify a bit. InstCombine is reducing this to:
```
define void @SpinningCube() {
entry:
store float 1.000000e+00, ptr undef, align 4
ret void
}
```
Kind of indicating that the backend is pretty bad at simplifying this already before this patch.
Main diff before legalization is that we get
```
t55: f32 = freeze undef:f32
t56: v4f32 = BUILD_VECTOR t55, t55, t55, ConstantFP:f32<1.000000e+00>
```
instead of
```
t49: v4f32 = BUILD_VECTOR undef:f32, undef:f32, undef:f32, ConstantFP:f32<1.000000e+00>
```
The latter would be legalized as a load from constant pool. But I don't think that happens when we have the BUILD_VECTOR with frozen undef. Instead the legalizer introduce another vector_shuffle(?).
So we get
```
Legalizing: t56: v4f32 = BUILD_VECTOR t55, t55, t55, ConstantFP:f32<1.000000e+00>
Trying custom legalization
Successfully custom legalized node
... replacing: t56: v4f32 = BUILD_VECTOR t55, t55, t55, ConstantFP:f32<1.000000e+00>
with: t74: v4f32 = vector_shuffle<4,5,6,3> t72, t73
```
instead of
```
Legalizing: t49: v4f32 = BUILD_VECTOR undef:f32, undef:f32, undef:f32, ConstantFP:f32<1.000000e+00>
Trying custom legalization
Could not custom legalize node
Trying to expand node
Successfully expanded node
... replacing: t49: v4f32 = BUILD_VECTOR undef:f32, undef:f32, undef:f32, ConstantFP:f32<1.000000e+00>
with: t67: v4f32,ch = load<(load (s128) from constant-pool)> t0, ConstantPool:i64<<4 x float> <float undef, float undef, float undef, float 1.000000e+00>> 0, undef:i64
```
https://github.com/llvm/llvm-project/pull/143102
More information about the llvm-commits
mailing list