[PATCH] D127115: [RFC][DAGCombine] Make sure combined nodes are added back to the worklist in topological order.

Fri Jan 13 00:35:18 PST 2023

deadalnix added inline comments.

================
Comment at: llvm/test/CodeGen/AMDGPU/widen-smrd-loads.ll:237
 ; VI-NEXT:    s_waitcnt lgkmcnt(0)
-; VI-NEXT:    s_add_i32 s1, s0, 12
+; VI-NEXT:    s_and_b32 s1, s0, 0xffff
 ; VI-NEXT:    v_mov_b32_e32 v2, s0
----------------
foad wrote:
> deadalnix wrote:
> > foad wrote:
> > > Given this DAG:
> > > ```
> > > SelectionDAG has 34 nodes:
> > >   t0: ch,glue = EntryToken
> > >                 t94: i64,ch = CopyFromReg t0, Register:i64 %1
> > >               t10: i64 = AssertAlign<16> t94
> > >             t11: i64 = add nuw t10, Constant:i64<36>
> > >           t91: v2i32,ch = load<(dereferenceable invariant load (s64) from %ir.arg.kernarg.offset, align 4, addrspace 4)> t0, t11, undef:i64
> > >         t92: i64 = bitcast t91
> > >       t99: i32,ch = load<(invariant load (s32) from %ir.arg.load, addrspace 4)> t0, t92, undef:i64
> > >     t101: i32 = and t99, Constant:i32<65535>
> > >   t102: i16 = truncate t101
> > >                 t49: i32 = any_extend t102
> > >               t79: i32 = add t49, Constant:i32<12>
> > >             t83: i32 = or t79, Constant:i32<4>
> > >           t98: i16 = truncate t83
> > >         t71: i16 = and t98, Constant:i16<255>
> > >                   t38: i16 = srl t102, Constant:i16<8>
> > >                 t46: i32 = any_extend t38
> > >               t81: i32 = add t46, Constant:i32<44>
> > >             t84: i32 = or t81, Constant:i32<3>
> > >           t95: i16 = truncate t84
> > >         t61: i16 = shl t95, Constant:i16<8>
> > >       t62: i16 = or t71, t61
> > >     t34: ch = store<(store (s16) into `ptr addrspace(1) null`, addrspace 1)> t0, t62, Constant:i64<0>, undef:i64
> > >   t28: ch = ENDPGM t34
> > > ```
> > > If the first thing we do is call simplifyDemandedBits on t101, then it will be removed (replaced with t99) since the upper 16 bits are not demanded.
> > > 
> > > But if the first thing we do is combine t49 then it will be replaced with t101 (since any_extend of a truncate is a no-op), losing the fact that we didn't care about the upper 16 bits, and then t101 can no longer be removed.
> > Funny enough, this is exactly why we want to move the DAG to be fully processed in topological order. But as we enforce topological processing on some part of the DAG, it is always possible that in other parts where it is implementation defined, it stops being done in that order. Eventually, all of it will be done in this order.
> > 
> > On that one specifically, t49 eventually sink to `t98: i16 = truncate t83` so simplifyDemandedBits still has the information it needs in the DAG. Have you looked at why it isn't finding it? Is it because it's simply too deep, and going that deep would be too costly?
> > 
> > On a side note, because order is implementation defined, there are likely instance of similar patterns in the wild where this doesn't get optimized.
> If I understand correctly:
> # we visit t49 and combine it into t101
> # we visit t79 and do nothing
> # because that did nothing, we do not revisit t83 and t98
> # if we //did// revisit t98 I think demanded bits would allow us to remove the AND in t101
I think so, unless it is too deep.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D127115/new/

https://reviews.llvm.org/D127115