[PATCH] D127115: [RFC][DAGCombine] Make sure combined nodes are added back to the worklist in topological order.

Amaury SECHET via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Jan 13 14:02:15 PST 2023


deadalnix added inline comments.


================
Comment at: llvm/test/CodeGen/AMDGPU/widen-smrd-loads.ll:237
 ; VI-NEXT:    s_waitcnt lgkmcnt(0)
-; VI-NEXT:    s_add_i32 s1, s0, 12
+; VI-NEXT:    s_and_b32 s1, s0, 0xffff
 ; VI-NEXT:    v_mov_b32_e32 v2, s0
----------------
foad wrote:
> deadalnix wrote:
> > foad wrote:
> > > deadalnix wrote:
> > > > foad wrote:
> > > > > deadalnix wrote:
> > > > > > foad wrote:
> > > > > > > Given this DAG:
> > > > > > > ```
> > > > > > > SelectionDAG has 34 nodes:
> > > > > > >   t0: ch,glue = EntryToken
> > > > > > >                 t94: i64,ch = CopyFromReg t0, Register:i64 %1
> > > > > > >               t10: i64 = AssertAlign<16> t94
> > > > > > >             t11: i64 = add nuw t10, Constant:i64<36>
> > > > > > >           t91: v2i32,ch = load<(dereferenceable invariant load (s64) from %ir.arg.kernarg.offset, align 4, addrspace 4)> t0, t11, undef:i64
> > > > > > >         t92: i64 = bitcast t91
> > > > > > >       t99: i32,ch = load<(invariant load (s32) from %ir.arg.load, addrspace 4)> t0, t92, undef:i64
> > > > > > >     t101: i32 = and t99, Constant:i32<65535>
> > > > > > >   t102: i16 = truncate t101
> > > > > > >                 t49: i32 = any_extend t102
> > > > > > >               t79: i32 = add t49, Constant:i32<12>
> > > > > > >             t83: i32 = or t79, Constant:i32<4>
> > > > > > >           t98: i16 = truncate t83
> > > > > > >         t71: i16 = and t98, Constant:i16<255>
> > > > > > >                   t38: i16 = srl t102, Constant:i16<8>
> > > > > > >                 t46: i32 = any_extend t38
> > > > > > >               t81: i32 = add t46, Constant:i32<44>
> > > > > > >             t84: i32 = or t81, Constant:i32<3>
> > > > > > >           t95: i16 = truncate t84
> > > > > > >         t61: i16 = shl t95, Constant:i16<8>
> > > > > > >       t62: i16 = or t71, t61
> > > > > > >     t34: ch = store<(store (s16) into `ptr addrspace(1) null`, addrspace 1)> t0, t62, Constant:i64<0>, undef:i64
> > > > > > >   t28: ch = ENDPGM t34
> > > > > > > ```
> > > > > > > If the first thing we do is call simplifyDemandedBits on t101, then it will be removed (replaced with t99) since the upper 16 bits are not demanded.
> > > > > > > 
> > > > > > > But if the first thing we do is combine t49 then it will be replaced with t101 (since any_extend of a truncate is a no-op), losing the fact that we didn't care about the upper 16 bits, and then t101 can no longer be removed.
> > > > > > Funny enough, this is exactly why we want to move the DAG to be fully processed in topological order. But as we enforce topological processing on some part of the DAG, it is always possible that in other parts where it is implementation defined, it stops being done in that order. Eventually, all of it will be done in this order.
> > > > > > 
> > > > > > On that one specifically, t49 eventually sink to `t98: i16 = truncate t83` so simplifyDemandedBits still has the information it needs in the DAG. Have you looked at why it isn't finding it? Is it because it's simply too deep, and going that deep would be too costly?
> > > > > > 
> > > > > > On a side note, because order is implementation defined, there are likely instance of similar patterns in the wild where this doesn't get optimized.
> > > > > If I understand correctly:
> > > > > # we visit t49 and combine it into t101
> > > > > # we visit t79 and do nothing
> > > > > # because that did nothing, we do not revisit t83 and t98
> > > > > # if we //did// revisit t98 I think demanded bits would allow us to remove the AND in t101
> > > > I think so, unless it is too deep.
> > > Is this a known deficiency in the DAGCombiner algorithm? I guess when we modify a node we add its immediate users to the worklist, but not all of its users-of-users and so on. Are there ways to work around this?
> > Right now, the order in which nodes are processed is implementation defined. This is a limitation of the current DAGCombiner. This patch is part of an effort to make the order topological.
> I understand that. Maybe I'm not phrasing my question very well.
> 
> Here's my understanding of the current behaviour //even with your patch applied//. We start off with:
> ```
>                 t49: i32 = any_extend t102
>               t79: i32 = add t49, Constant:i32<12>
>             t83: i32 = or t79, Constant:i32<4>
>           t98: i16 = truncate t83
> ```
> We process t49, and we are able to simplify it, so we add its user t79 to the worklist. Then we process t79 but are not able to simplify it, so we do not add t83 (or t98) to the worklist.
> 
> But there is a problem here: the fact that we simplify t49 means that we //could// now make some progress if we tried to combine t98, because that would call SimplifyDemandedBits which can peek several levels "down" into the DAG, and simplify t102 or t101.
> 
> So there's a missed opportunity here. The fact that we simplified t49 unlocks a potential further simplification three levels "up" in the DAG, but we only ever add immediate users (one level "up") to the worklist when we simplify something.
I'm not sure where we are talking past each other so let me step back and explain how DAGCombine works and what we want to change about it.

First, the DAGCombiner adds all the nodes in the DAG in its worklist, in an implementation defined order. The actual order depends on the specific operation performed on the DAG before it reaches the combiner. In your example, that would include both t49 and t98.

Then it goes over all nodes and try to combine them. When a node is combined, it readds the result of the combine to the worklist as well as its users. In this case, the combiner visits t98 at some point, then t49, which it combines into t101, but then, it never revisit t98.

Visiting the DAG in topological order, you'd combine t49 before going over t98. This diff does part of the job, but clearly isn't the end of the story. However, it's one of the most disruptive step.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D127115/new/

https://reviews.llvm.org/D127115



More information about the llvm-commits mailing list