[PATCH] D29813: [DAGCombiner] Fix DebugLoc propagation when folding !(x cc y) -> (x !cc y)

Wed Feb 22 11:03:34 PST 2017

twoh added inline comments.

================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:4700-4703
+        return DAG.getSetCC(SDLoc(N0), VT, LHS, RHS, NotCC);
       case ISD::SELECT_CC:
-        return DAG.getSelectCC(SDLoc(N), LHS, RHS, N0.getOperand(2),
+        return DAG.getSelectCC(SDLoc(N0), LHS, RHS, N0.getOperand(2),
                                N0.getOperand(3), NotCC);
----------------
andreadb wrote:
> aprantl wrote:
> > twoh wrote:
> > > andreadb wrote:
> > > > twoh wrote:
> > > > > andreadb wrote:
> > > > > > With this change, we always propagate the debug location of N0.
> > > > > > 
> > > > > > However, this may be suboptimal if we end up in a situation where N0 has a null DebugLoc, and the DebugLoc of N is not null. In that case, shouldn't we propagate SDLoc(N) instead?
> > > > > > 
> > > > > > When constructing a new SetCC/SelectCC node, what if we give higher priority to SDLoc(N) over SDLoc(N0) based on whether `N0->getDebugLoc()` returns a null location?
> > > > > > 
> > > > > > In order to guarantee a deterministic schedule, we could always call `SelectionDAG::UpdadeSDLocOnMergedSDNode()` on the newly created dag node, with the goal to set the new IROrder equal to the minimum node order between N0 and N.
> > > > > > 
> > > > > > Not sure if that makes sense.
> > > > > > I hope this helps.
> > > > > @andreadb Thanks for the comments! In this particular case I think it is not correct to take SDLoc from N, and it is better to let the new SDLoc have null DebugLoc if N0 has a null DebugLoc. If we can't have a precise DebugLoc, I think it is better with no DebugLoc than imprecise DebugLoc. 
> > > > > 
> > > > > Regarding scheduling order, It seems more natural to me that the new SDValue takes IROrder from N0 if it takes SDLoc from N0. A deterministic schedule is a desirable property, but not sure if it needs to be forced here. 
> > > > I have to admit, it is not particularly obvious to me why a null DebugLoc is better than a potentially imprecise DebugLoc from N. That said, I am not worried about that particular case.
> > > > I think that overall the patch is good - but please wait for Adrian.
> > > > 
> > > > 
> > > For debugger there might be not much difference. However, for sample-profile based optimization, while null DebugLoc is simply ignored by the optimizer, imprecise DebugLoc may end up with "wrong" optimization. You can find more detailed discussion on it from comments under D25742.
> > Ok, if I understand this correctly:
> > `
> > t1: i1 = setcc t2, t3, cc, LOC0
> > t4: i1 = xor t1, Constant:i1<-1>, LOC1
> > ->
> > t5: i1 = setcc t2, t3 !cc, LOC0
> > `
> > 
> > There are two reasonable ways to approach this:
> > - "We are folding the xor into the setcc", so the location from the setcc wins / remains untouched.
> >   This is what this patch implements.
> > - "We are merging two instructions", so we should merge the two locations.
> >   Then we should use the getMergedDebugLoc API.
> > 
> > I'm fine with both of the above variants. I don't think we should get into the business of "prefer Loc1, but take Loc2 if Loc1 isn't available". If we are concerned about loosing fidelity, we should investigate why the setcc doesn't have a location instead and we will benefit in more places from that.
> > 
> > (I'm aware that this is more a personal opinion than a sound technical argument :-)
> D25742 is a completely different case where common code from two basic blocks is merged into a common tail. In that case, we don't want that common code still refers to its original debugloc.
> 
> In your test case, you are combining instructions which are from the same basic block. I don't see how this can affect the sample profile; in particular, it cannot negatively impact the computation of block frequency. Essentially, the problem is only when you are counting sample hits against the wrong basic block.
@andreadb Oh, I just wanted to bring up the example of null debugloc is better than imprecise debugloc for sample profile. Sorry for the confusion.

@aprantl In general I think using getMergedDebugLoc API is the right way to go, but for this specific case it is pretty clear that xor is a part of what setcc supposed to do (i.e. setting predicate), so it makes more sense to take SDLoc from setcc. 

In particular, I believe that SelectionDAGBuilder::visitSwitchCase (https://reviews.llvm.org/diffusion/L/browse/llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp;295856$1872) is one major source generating the pattern of setcc followed by xor with constant -1, and when xor is created at that point it should have take SDLoc from setcc not from following branch instruction. Unfortunately, xor instruction sets its SDLoc to the value returned from getCurSDLoc() call, but CurInst is already a branch instruction. Fixing SDLoc for xor there would involve non-trivial amount of work because it will require SelectionDAGBuilder to maintain not only current instruction but also previous instruction, so I prefer to address it here. 

https://reviews.llvm.org/D29813