[PATCH] D124743: [DAGCombine] Add node in the worklist in topological order in CombineTo
Eli Friedman via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri May 6 12:33:33 PDT 2022
efriedma added inline comments.
================
Comment at: llvm/test/CodeGen/ARM/swifterror.ll:687
+; CHECK-APPLE-NEXT: add r0, r0, #4
+; CHECK-APPLE-NEXT: ldr r2, [r7, #8]
+; CHECK-APPLE-NEXT: ldr r1, [r0], #4
----------------
deadalnix wrote:
> deadalnix wrote:
> > RKSimon wrote:
> > > This looks like another lost multiple load/store ?
> > Yes, this looks similar to what is happening to arm64-abi-varargs.ll .
> Investigating this a bit more, it looks like we get some crazy combine happening when CombineTo work in topological order that are missed otherwise.
>
> The dag looks like this with topological CombineTo:
> ```
> SelectionDAG has 58 nodes:
> t0: ch = EntryToken
> t5: i32,ch = CopyFromReg t0, Register:i32 %1
> t9: i32 = add FrameIndex:i32<-1>, Constant:i32<4>
> t11: i32,ch = CopyFromReg t0, Register:i32 %2
> t15: i32,ch = CopyFromReg t0, Register:i32 %3
> t19: i32,ch = CopyFromReg t0, Register:i32 %4
> t73: i32 = add FrameIndex:i32<-1>, Constant:i32<12>
> t68: ch = store<(store (s32) into unknown-address + 12)> t0, t19, t73, undef:i32
> t77: i32 = add FrameIndex:i32<-1>, Constant:i32<8>
> t74: ch = store<(store (s32) into unknown-address + 8)> t0, t15, t77, undef:i32
> t78: ch = store<(store (s32) into %fixed-stack.0 + 4)> t0, t11, t9, undef:i32
> t81: ch = store<(store (s32) into %fixed-stack.0)> t0, t5, FrameIndex:i32<-1>, undef:i32
> t83: ch = TokenFactor t19:1, t68, t15:1, t74, t11:1, t78, t5:1, t81
> t29: ch,glue = callseq_start t83, TargetConstant:i32<0>, TargetConstant:i32<0>
> t33: ch,glue = CopyToReg t29, Register:i32 $r0, Constant:i32<16>
> t35: ch,glue = CopyToReg t33, Register:i32 $r1, Constant:i32<0>, t33:1
> t38: ch,glue = ARMISD::CALL t35, TargetGlobalAddress:i32<i8* (i64)* @malloc> 0, Register:i32 $r0, Register:i32 $r1, RegisterMask:Untyped, t35:1
> t40: ch,glue = callseq_end t38, TargetConstant:i32<0>, TargetConstant:i32<-1>, t38:1
> t41: i32,ch,glue = CopyFromReg t40, Register:i32 $r0, t40:1
> t43: ch = CopyToReg t41:1, Register:i32 %5, t41
> t45: i32 = add nuw t41, Constant:i32<8>
> t85: ch = store<(store (s8) into %ir.tmp), trunc to i8> t43, Constant:i32<1>, t45, undef:i32
> t137: ch = store<(store (s32) into %ir.a12)> t43, t135, FrameIndex:i32<3>, undef:i32
> t128: ch = store<(store (s32) into %ir.args, align 8)> t43, t135:1, FrameIndex:i32<0>, undef:i32
> t140: ch = store<(store (s32) into %ir.a11)> t43, t122, FrameIndex:i32<2>, undef:i32
> t144: ch = store<(store (s32) into %ir.a10)> t43, t109, FrameIndex:i32<1>, undef:i32
> t147: ch = TokenFactor t137, t135:2, t128, t140, t122:2, t144, t109:1
> t64: ch,glue = CopyToReg t147, Register:i32 $r0, Constant:i32<1065353216>
> t66: ch,glue = CopyToReg t64, Register:i32 $r8, Register:i32 %5, t64:1
> t109: i32,ch = load<(load (s32))> t85, FrameIndex:i32<-1>, undef:i32
> t122: i32,i32,ch = load<(load (s32) from %fixed-stack.0 + 8), <post-inc>> t85, t9, Constant:i32<4>
> t135: i32,i32,ch = load<(load (s32)), <post-inc>> t85, t122:1, Constant:i32<4>
> t67: ch = ARMISD::RET_FLAG t66, Register:i32 $r0, Register:i32 $r8, t66:1
> ```
>
> And without:
> ```
> SelectionDAG has 58 nodes:
> t0: ch = EntryToken
> t5: i32,ch = CopyFromReg t0, Register:i32 %1
> t9: i32 = add FrameIndex:i32<-1>, Constant:i32<4>
> t11: i32,ch = CopyFromReg t0, Register:i32 %2
> t15: i32,ch = CopyFromReg t0, Register:i32 %3
> t19: i32,ch = CopyFromReg t0, Register:i32 %4
> t77: i32 = add FrameIndex:i32<-1>, Constant:i32<8>
> t73: i32 = add FrameIndex:i32<-1>, Constant:i32<12>
> t68: ch = store<(store (s32) into unknown-address + 12)> t0, t19, t73, undef:i32
> t74: ch = store<(store (s32) into unknown-address + 8)> t0, t15, t77, undef:i32
> t78: ch = store<(store (s32) into %fixed-stack.0 + 4)> t0, t11, t9, undef:i32
> t81: ch = store<(store (s32) into %fixed-stack.0)> t0, t5, FrameIndex:i32<-1>, undef:i32
> t83: ch = TokenFactor t19:1, t68, t15:1, t74, t11:1, t78, t5:1, t81
> t29: ch,glue = callseq_start t83, TargetConstant:i32<0>, TargetConstant:i32<0>
> t33: ch,glue = CopyToReg t29, Register:i32 $r0, Constant:i32<16>
> t35: ch,glue = CopyToReg t33, Register:i32 $r1, Constant:i32<0>, t33:1
> t38: ch,glue = ARMISD::CALL t35, TargetGlobalAddress:i32<i8* (i64)* @malloc> 0, Register:i32 $r0, Register:i32 $r1, RegisterMask:Untyped, t35:1
> t40: ch,glue = callseq_end t38, TargetConstant:i32<0>, TargetConstant:i32<-1>, t38:1
> t41: i32,ch,glue = CopyFromReg t40, Register:i32 $r0, t40:1
> t43: ch = CopyToReg t41:1, Register:i32 %5, t41
> t45: i32 = add nuw t41, Constant:i32<8>
> t85: ch = store<(store (s8) into %ir.tmp), trunc to i8> t43, Constant:i32<1>, t45, undef:i32
> t135: ch = store<(store (s32) into %ir.a12)> t43, t132, FrameIndex:i32<3>, undef:i32
> t127: ch = store<(store (s32) into %ir.args, align 8)> t43, t73, FrameIndex:i32<0>, undef:i32
> t138: ch = store<(store (s32) into %ir.a11)> t43, t120, FrameIndex:i32<2>, undef:i32
> t142: ch = store<(store (s32) into %ir.a10)> t43, t109, FrameIndex:i32<1>, undef:i32
> t145: ch = TokenFactor t135, t132:1, t127, t138, t120:1, t142, t109:1
> t64: ch,glue = CopyToReg t145, Register:i32 $r0, Constant:i32<1065353216>
> t66: ch,glue = CopyToReg t64, Register:i32 $r8, Register:i32 %5, t64:1
> t109: i32,ch = load<(load (s32))> t85, FrameIndex:i32<-1>, undef:i32
> t120: i32,ch = load<(load (s32))> t85, t9, undef:i32
> t132: i32,ch = load<(load (s32))> t85, t77, undef:i32
> t67: ch = ARMISD::RET_FLAG t66, Register:i32 $r0, Register:i32 $r8, t66:1
> ```
>
> The meaningful change seems to be these crazy loads:
> ```
> t122: i32,i32,ch = load<(load (s32) from %fixed-stack.0 + 8), <post-inc>> t85, t9, Constant:i32<4>
> t135: i32,i32,ch = load<(load (s32)), <post-inc>> t85, t122:1, Constant:i32<4>
> ```
>
> It'd be really helpful to have some ARM expert here to help us knowing what the right path forward is.
We fail to form a post-inc load, which changes the register allocation, which blocks use of "ldm".
Post-inc load formation failing to trigger is a bit concerning; makes me think we somehow aren't running the relevant combine late enough. DAGCombiner::CombineToPreIndexedLoadStore is a target-independent combine with a bit of target-specific code; see getPreIndexedAddressParts().
"ldm" formation is unreliable on ARM because it runs after register allocation; I wouldn't worry about that part.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D124743/new/
https://reviews.llvm.org/D124743
More information about the llvm-commits
mailing list