[PATCH] D127115: [RFC][DAGCombine] Make sure combined nodes are added back to the worklist in topological order.
Amaury SECHET via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sat Jun 11 16:41:40 PDT 2022
deadalnix added inline comments.
================
Comment at: llvm/test/CodeGen/X86/load-partial.ll:117
+; SSE2-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,1],xmm1[0,2]
+; SSE2-NEXT: retq
+;
----------------
deadalnix wrote:
> Before:
> ```
> SelectionDAG has 9 nodes:
> t0: ch = EntryToken
> t2: i64,ch = CopyFromReg t0, Register:i64 %0
> t33: v4f32,ch = load<(dereferenceable load (s128) from %ir.2, align 4)> t0, t2, undef:i64
> t22: ch,glue = CopyToReg t0, Register:v4f32 $xmm0, t33
> t23: ch = X86ISD::RET_FLAG t22, TargetConstant:i32<0>, Register:v4f32 $xmm0, t22:1
> ```
>
> After:
> ```
> SelectionDAG has 19 nodes:
> t0: ch = EntryToken
> t2: i64,ch = CopyFromReg t0, Register:i64 %0
> t31: f64,ch = load<(dereferenceable load (s64) from %ir.2, align 4)> t0, t2, undef:i64
> t32: v2f64 = scalar_to_vector t31
> t33: v4f32 = bitcast t32
> t15: i64 = add nuw t2, Constant:i64<8>
> t16: f32,ch = load<(dereferenceable load (s32) from %ir.8)> t0, t15, undef:i64
> t35: v4f32 = scalar_to_vector t16
> t38: v4f32 = X86ISD::SHUFP t35, t33, TargetConstant:i8<48>
> t40: v4f32 = X86ISD::SHUFP t33, t38, TargetConstant:i8<-124>
> t22: ch,glue = CopyToReg t0, Register:v4f32 $xmm0, t40
> t23: ch = X86ISD::RET_FLAG t22, TargetConstant:i32<0>, Register:v4f32 $xmm0, t22:1
> ```
>
> This is definitively serious.
This turns out to be pretty interesting when take a step back. Before the diff we had:
```
SelectionDAG has 20 nodes:
t0: ch = EntryToken
t2: i64,ch = CopyFromReg t0, Register:i64 %0
t31: f32 = extract_vector_elt t30, Constant:i64<0>
t32: f32 = extract_vector_elt t30, Constant:i64<1>
t15: i64 = add nuw t2, Constant:i64<8>
t16: f32,ch = load<(dereferenceable load (s32) from %ir.8)> t0, t15, undef:i64
t27: v4f32 = BUILD_VECTOR t31, t32, t16, undef:f32
t22: ch,glue = CopyToReg t0, Register:v4f32 $xmm0, t27
t28: f64,ch = load<(dereferenceable load (s64) from %ir.2, align 4)> t0, t2, undef:i64
t29: v2f64 = scalar_to_vector t28
t30: v4f32 = bitcast t29
t23: ch = X86ISD::RET_FLAG t22, TargetConstant:i32<0>, Register:v4f32 $xmm0, t22:1
```
And after:
```
SelectionDAG has 16 nodes:
t0: ch = EntryToken
t2: i64,ch = CopyFromReg t0, Register:i64 %0
t31: f64,ch = load<(dereferenceable load (s64) from %ir.2, align 4)> t0, t2, undef:i64
t32: v2f64 = scalar_to_vector t31
t33: v4f32 = bitcast t32
t15: i64 = add nuw t2, Constant:i64<8>
t16: f32,ch = load<(dereferenceable load (s32) from %ir.8)> t0, t15, undef:i64
t19: v4f32 = insert_vector_elt t33, t16, Constant:i64<2>
t22: ch,glue = CopyToReg t0, Register:v4f32 $xmm0, t19
t23: ch = X86ISD::RET_FLAG t22, TargetConstant:i32<0>, Register:v4f32 $xmm0, t22:1
```
The former is crunched down to almost nothing while the later gets turned into SHUFP. The magic seems to happen when legalizing the BUILD_VECTOR, which gets turned into an aggregated load, while `v4f32 = insert_vector_elt t33, t16, Constant:i64<2>` gets expanded into a `v4f32 = vector_shuffle<0,1,4,3> t33, t35` which in turn is legalized into a series of SHUFP.
It seems a bit strange to me that a legalization does most of the optimization here, and it turns out to be somewhat fragile.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D127115/new/
https://reviews.llvm.org/D127115
More information about the llvm-commits
mailing list