[PATCH] D23824: [ADCE] Add handling of PHI nodes when removing control flow

Tue Sep 6 12:30:59 PDT 2016

david2050 added inline comments.

================
Comment at: lib/Transforms/Scalar/ADCE.cpp:463
@@ +462,3 @@
+      }
+      if (CommonReachngDefinition != Value) {
+        // When there are two definitions from "dead" predecessor paths we
----------------
dberlin wrote:
> david2050 wrote:
> > dberlin wrote:
> > > david2050 wrote:
> > > > dberlin wrote:
> > > > > dberlin wrote:
> > > > > > david2050 wrote:
> > > > > > > This step is necessary otherwise we delete a branch which determines a live value. It is an artifact of the semantics of PHI nodes. This is about preserving existing structure not translating to a new form. 
> > > > > > > 
> > > > > > > And Yes, outside of loops, the baseline code catches almost all cases. The primary motivation was dead, may-be-infinute loops.
> > > > > > I understand the semantics of phi nodes :)
> > > > > > 
> > > > > > I'm asking why you don't just mark them *all* live.
> > > > > > Traditionally, in DCE, what would happen is that if the phi is live, you mark all values that are incoming to the phi as live (and in a control dependence DCE, any control dependences of the necessary edges)
> > > > > > 
> > > > > > Here, it seems like you are trying to figure out if the phi is useless by seeing if all branches have the same incoming value, and if so, you are arbitrarily choosing one so you can delete at least one branch.
> > > > > > That seems .. pointless. 
> > > > > > 
> > > > > > It should only also matter in the case of either predecessors that only exist for control flow (IE empty basic blocks), or critical edges.
> > > > > > 
> > > > > > 
> > > > > > The only interesting loop case i'm aware of where this matters is:
> > > > > > if (b)
> > > > > > {
> > > > > > int i;
> > > > > > for (i = 0; i<1000; ++i);
> > > > > >  j = 0;
> > > > > > }
> > > > > > return j;
> > > > > > }
> > > > > > 
> > > > > > But your current code won't allow deleting the loop in this case, AFAICT.
> > > > > > (among other things, you would need to determine what the control dependence would look like if the critical edge was split without splitting it. This is what GCC does)
> > > > > >  
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > (and to be super clear, when i say "I'm asking why you don't just mark them *all* live.", i mean "all the arguments and edges of a given phi node", not "all phis").
> > > > > 
> > > > It is not all branches, it is all branches from dead predecessors since in the case that we should remove all such predecessors to be replaced with a single edge, we verify that there is a unique value for that edge. If not we select one predecessor to keep live. Consider 
> > > >   
> > > >   a = 0
> > > >   if (x) { 
> > > >         if (y) a =1
> > > >         else  a = 2
> > > >    }
> > > >    else a  =3
> > > >    phi(...)
> > > >    
> > > > The phi node following this block has tree edges labeled with values 1,2,3 respectively.
> > > > The guarded basic blocks are empty because the assignments are folded into the phi node.
> > > > The branch decision at which tests 'y' is dead at this point because there are no live operations
> > > > in any block it controls.  
> > > > Here we determine that we need to keep at least one of the empty blocks around to distinguish between 1 and 2, which models the behavior if the assignments to 'a' were instead explicitly in the original blocks.
> > > > 
> > > > Given a more complex example we might have
> > > > 
> > > >   a = 0
> > > >   if (x) { 
> > > >         if (z)  
> > > >            a=1
> > > >         else { 
> > > >            if (y)  { }  
> > > >            else { } 
> > > >         }
> > > >    }
> > > >    else a  =3
> > > >    phi(...)
> > > > 
> > > > Here there are 4 predecessors to the join, three are 2 but only two distinct values and we only
> > > > need to have the branch on 'z' live. The branch on 'y' need not be. This is why the code does not mark 
> > > > all incoming edges live but does not do the extra work to try and pick and optimal choice (the path with a=1 in this case). The scenario is rare enough that it is not clear extra work is worth the effort.
> > > > 
> > > > In your example, I believe the current code will allow the remove the loop since the assignment to J is not control dependent on the loop but only on the branch testing 'b'. The subset of the CFG corresponding to the loop would be all marked dead and removed and the edge leaving the block holding the test on 'b' redirected to the block holding 'j=0'.  I don't understand your reference to splitting a critical edge.
> > > (Sorry if this is short, i lost this comment once already). 
> > > Okay, we are talking past each other.
> > > I understand what you are doing, and why you are trying to mark only certain values of a phi live.
> > > I'm saying "this is not worth it, and is complex" *compared to just marking all of them*. You produced an example showing it removes a single block. I agree it can remove that block.
> > > Instead of arguing by example, i produced data:
> > > 
> > > I ran your patch on ~5000 C++ packages as is, and then ran it on ~5000 C++ packages with it changed to just mark all phi node inputs as live when the phi node is live.
> > > 
> > > The number of cases i find where it makes a difference is ~0.01%.
> > > The average size of the binaries in which it does something are ~16 bytes smaller (which, for those binaries, amounts to an average size change of 0.00005%)
> > > 
> > > That seems "not worth it" to me :)
> > > 
> > > Now, i'm just trying to provide data not actually to say "we shouldn't do it this way", but more "if we are going to add complexity, let's do it where we have data that shows it's worth it". We both can clearly come up with contrived examples where it matters.
> > > 
> > > So if you have data showing trying to only mark single branches of unique values live is worth it, great, add the data and let's stop arguing, and let's do it.
> > > 
> > > "In your example, I believe the current code will allow the remove the loop since the assignment to J is not control dependent on the loop but only on the branch testing 'b'. The subset of the CFG corresponding to the loop would be all marked dead and removed and the edge leaving the block holding the test on 'b' redirected to the block holding 'j=0'. I don't understand your reference to splitting a critical edge."
> > > 
> > > You should try it before you knock it :)
> > > 
> > > You may have to change it to j=b or add a new parameter somewhere, depending on what passes you run before it (otherwise the phi ends up with the constant propagated into it).
> > > 
> > > You will end up with a phi node in the end block that merges the else-condition value (IE from outside the condition entirely) of jwith the inside-condition value.  Control-dependence will force you to keep the empty loop live due to the critical edge (again, whether the edge stays critical or not depends on where you end up in the pass pipeline)
> > > 
> > > This case is more common, it affects at least ~1% of the *packages*.
> > > I did not fix it, so i can't say how much it saves or not saves.
> > > It is common enough that gcc special cases it (as i said) and has regression tests to make sure the loop is eliminated by DCE.
> > > 
> > 
> > Thanks for great due diligence for the patch.
> > 
> > The experiment you ran seems doomed because the patches submitted so far are incomplete. If you look around line 305  there is loop which forces all branches live. This is temporary waiting for the next patch which includes the code to delete branches. Right now the patch reduces to prior behavior of not actually deleting in branches although it takes a long way to get there. Thus any changes you see based on your edit are curious since it should not have any impact on the generated code at all.
> > 
> > I certainly agree that even the relatively simple pick-one-value here may have no value. Perhaps we can accept this change as is, with a TODO to evaluate this point after the next (and final) patch is up? (Assuming there is nothing else to change along the way)
> > 
> > 
> I feel like you are implying something here.  
> 
> In this case, I simply hacked in the right parts of the rest of the old patch (and a little work) before  it was broken up before running the experiment. (You could have simply asked for the code if you are concerned).
> 
> You are talking about something that is < 50 lines of code to make work (control dependence DCE is not that hard, nor is this even the first implementation of it in LLVM!), so i'm not sure why you would think i didn't run the experiment properly?
> (I could also point out i don't actually care about correctness of the generated code, only the maximum optimization value)
> 
> If you would like to actually run *your own experiment* show that this is worth it, you are, as i said, welcome.  You may want to focus on that instead of trying to pick mine apart ;)
> 
> Right now, like I said, i see *nothing* that makes it seem like this is worth it, and your response here doesn't help that.  If you want to produce data that says it's worth it, great!. Otherwise, I'm loathe to start by adding complexity and then try to take it away later. That never works.
> 
No implications, and apologies for suggesting you were incomplete in your analysis. 

To be clear then, in your variant, you marked every basic block which is a predecessor of a live-phi as live?

https://reviews.llvm.org/D23824