[PATCH] D124743: [DAGCombine] Add node in the worklist in topological order in CombineTo

Eli Friedman via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri May 6 12:33:33 PDT 2022


efriedma added inline comments.


================
Comment at: llvm/test/CodeGen/ARM/swifterror.ll:687
+; CHECK-APPLE-NEXT:    add r0, r0, #4
+; CHECK-APPLE-NEXT:    ldr r2, [r7, #8]
+; CHECK-APPLE-NEXT:    ldr r1, [r0], #4
----------------
deadalnix wrote:
> deadalnix wrote:
> > RKSimon wrote:
> > > This looks like another lost multiple load/store ?
> > Yes, this looks similar to what is happening to arm64-abi-varargs.ll .
> Investigating this a bit more, it looks like we get some crazy combine happening when CombineTo work in topological order that are missed otherwise.
> 
> The dag looks like this with topological CombineTo:
> ```
> SelectionDAG has 58 nodes:
>   t0: ch = EntryToken
>   t5: i32,ch = CopyFromReg t0, Register:i32 %1
>   t9: i32 = add FrameIndex:i32<-1>, Constant:i32<4>
>   t11: i32,ch = CopyFromReg t0, Register:i32 %2
>   t15: i32,ch = CopyFromReg t0, Register:i32 %3
>   t19: i32,ch = CopyFromReg t0, Register:i32 %4
>           t73: i32 = add FrameIndex:i32<-1>, Constant:i32<12>
>         t68: ch = store<(store (s32) into unknown-address + 12)> t0, t19, t73, undef:i32
>           t77: i32 = add FrameIndex:i32<-1>, Constant:i32<8>
>         t74: ch = store<(store (s32) into unknown-address + 8)> t0, t15, t77, undef:i32
>         t78: ch = store<(store (s32) into %fixed-stack.0 + 4)> t0, t11, t9, undef:i32
>         t81: ch = store<(store (s32) into %fixed-stack.0)> t0, t5, FrameIndex:i32<-1>, undef:i32
>       t83: ch = TokenFactor t19:1, t68, t15:1, t74, t11:1, t78, t5:1, t81
>     t29: ch,glue = callseq_start t83, TargetConstant:i32<0>, TargetConstant:i32<0>
>   t33: ch,glue = CopyToReg t29, Register:i32 $r0, Constant:i32<16>
>   t35: ch,glue = CopyToReg t33, Register:i32 $r1, Constant:i32<0>, t33:1
>   t38: ch,glue = ARMISD::CALL t35, TargetGlobalAddress:i32<i8* (i64)* @malloc> 0, Register:i32 $r0, Register:i32 $r1, RegisterMask:Untyped, t35:1
>   t40: ch,glue = callseq_end t38, TargetConstant:i32<0>, TargetConstant:i32<-1>, t38:1
>   t41: i32,ch,glue = CopyFromReg t40, Register:i32 $r0, t40:1
>   t43: ch = CopyToReg t41:1, Register:i32 %5, t41
>     t45: i32 = add nuw t41, Constant:i32<8>
>   t85: ch = store<(store (s8) into %ir.tmp), trunc to i8> t43, Constant:i32<1>, t45, undef:i32
>       t137: ch = store<(store (s32) into %ir.a12)> t43, t135, FrameIndex:i32<3>, undef:i32
>       t128: ch = store<(store (s32) into %ir.args, align 8)> t43, t135:1, FrameIndex:i32<0>, undef:i32
>       t140: ch = store<(store (s32) into %ir.a11)> t43, t122, FrameIndex:i32<2>, undef:i32
>       t144: ch = store<(store (s32) into %ir.a10)> t43, t109, FrameIndex:i32<1>, undef:i32
>     t147: ch = TokenFactor t137, t135:2, t128, t140, t122:2, t144, t109:1
>   t64: ch,glue = CopyToReg t147, Register:i32 $r0, Constant:i32<1065353216>
>   t66: ch,glue = CopyToReg t64, Register:i32 $r8, Register:i32 %5, t64:1
>   t109: i32,ch = load<(load (s32))> t85, FrameIndex:i32<-1>, undef:i32
>   t122: i32,i32,ch = load<(load (s32) from %fixed-stack.0 + 8), <post-inc>> t85, t9, Constant:i32<4>
>   t135: i32,i32,ch = load<(load (s32)), <post-inc>> t85, t122:1, Constant:i32<4>
>   t67: ch = ARMISD::RET_FLAG t66, Register:i32 $r0, Register:i32 $r8, t66:1
> ```
> 
> And without:
> ```
> SelectionDAG has 58 nodes:
>   t0: ch = EntryToken
>   t5: i32,ch = CopyFromReg t0, Register:i32 %1
>   t9: i32 = add FrameIndex:i32<-1>, Constant:i32<4>
>   t11: i32,ch = CopyFromReg t0, Register:i32 %2
>   t15: i32,ch = CopyFromReg t0, Register:i32 %3
>   t19: i32,ch = CopyFromReg t0, Register:i32 %4
>   t77: i32 = add FrameIndex:i32<-1>, Constant:i32<8>
>   t73: i32 = add FrameIndex:i32<-1>, Constant:i32<12>
>         t68: ch = store<(store (s32) into unknown-address + 12)> t0, t19, t73, undef:i32
>         t74: ch = store<(store (s32) into unknown-address + 8)> t0, t15, t77, undef:i32
>         t78: ch = store<(store (s32) into %fixed-stack.0 + 4)> t0, t11, t9, undef:i32
>         t81: ch = store<(store (s32) into %fixed-stack.0)> t0, t5, FrameIndex:i32<-1>, undef:i32
>       t83: ch = TokenFactor t19:1, t68, t15:1, t74, t11:1, t78, t5:1, t81
>     t29: ch,glue = callseq_start t83, TargetConstant:i32<0>, TargetConstant:i32<0>
>   t33: ch,glue = CopyToReg t29, Register:i32 $r0, Constant:i32<16>
>   t35: ch,glue = CopyToReg t33, Register:i32 $r1, Constant:i32<0>, t33:1
>   t38: ch,glue = ARMISD::CALL t35, TargetGlobalAddress:i32<i8* (i64)* @malloc> 0, Register:i32 $r0, Register:i32 $r1, RegisterMask:Untyped, t35:1
>   t40: ch,glue = callseq_end t38, TargetConstant:i32<0>, TargetConstant:i32<-1>, t38:1
>   t41: i32,ch,glue = CopyFromReg t40, Register:i32 $r0, t40:1
>   t43: ch = CopyToReg t41:1, Register:i32 %5, t41
>     t45: i32 = add nuw t41, Constant:i32<8>
>   t85: ch = store<(store (s8) into %ir.tmp), trunc to i8> t43, Constant:i32<1>, t45, undef:i32
>       t135: ch = store<(store (s32) into %ir.a12)> t43, t132, FrameIndex:i32<3>, undef:i32
>       t127: ch = store<(store (s32) into %ir.args, align 8)> t43, t73, FrameIndex:i32<0>, undef:i32
>       t138: ch = store<(store (s32) into %ir.a11)> t43, t120, FrameIndex:i32<2>, undef:i32
>       t142: ch = store<(store (s32) into %ir.a10)> t43, t109, FrameIndex:i32<1>, undef:i32
>     t145: ch = TokenFactor t135, t132:1, t127, t138, t120:1, t142, t109:1
>   t64: ch,glue = CopyToReg t145, Register:i32 $r0, Constant:i32<1065353216>
>   t66: ch,glue = CopyToReg t64, Register:i32 $r8, Register:i32 %5, t64:1
>   t109: i32,ch = load<(load (s32))> t85, FrameIndex:i32<-1>, undef:i32
>   t120: i32,ch = load<(load (s32))> t85, t9, undef:i32
>   t132: i32,ch = load<(load (s32))> t85, t77, undef:i32
>   t67: ch = ARMISD::RET_FLAG t66, Register:i32 $r0, Register:i32 $r8, t66:1
> ```
> 
> The meaningful change seems to be these crazy loads:
> ```
>   t122: i32,i32,ch = load<(load (s32) from %fixed-stack.0 + 8), <post-inc>> t85, t9, Constant:i32<4>
>   t135: i32,i32,ch = load<(load (s32)), <post-inc>> t85, t122:1, Constant:i32<4>
> ```
> 
> It'd be really helpful to have some ARM expert here to help us knowing what the right path forward is.
We fail to form a post-inc load, which changes the register allocation, which blocks use of "ldm".

Post-inc load formation failing to trigger is a bit concerning; makes me think we somehow aren't running the relevant combine late enough.  DAGCombiner::CombineToPreIndexedLoadStore is a target-independent combine with a bit of target-specific code; see getPreIndexedAddressParts().

"ldm" formation is unreliable on ARM because it runs after register allocation; I wouldn't worry about that part.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124743/new/

https://reviews.llvm.org/D124743



More information about the llvm-commits mailing list