[PATCH] D11393: [X86] Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed.
David Kreitzer via llvm-commits
llvm-commits at lists.llvm.org
Thu Jan 28 13:32:17 PST 2016
DavidKreitzer added a comment.
Thanks for the fixes and for cleaning up the unnecessary conditions. I still think the FBB==nullptr case is broken in the COND_E_AND_NP detection code, but otherwise, this looks good.
Did you get reproducers for the failures that people were seeing with this patch? Do these fixes solve them?
Comment at: llvm/trunk/lib/Target/X86/X86InstrInfo.cpp:4071
@@ +4070,3 @@
+ // See comments above for X86::COND_P_AND_NE.
+ BranchCode = X86::COND_E_AND_NP;
+ } else
> DavidKreitzer wrote:
> > I think you need to verify here that NewTBB == FBB before making this transformation. If FBB is nullptr, you'll need to compute the fallthrough block and check that it matches NewTBB.
> When FBB is nullptr, we could not safely let the fallthrough block be FBB. This is because there is a use case of AnalyzeBranch in block-placement where MBBs are reordered before this function is called, in which case the fallthrough MBB may have nothing to do with the branch.
> But we can do this check if FBB is not null.
To be clear, I wasn't suggesting that you actually compute & set FBB here in the case when it was initially nullptr. Rather, I am saying that you cannot legally use COND_E_AND_NP here without proving that the target of the first branch is the same as the fall-through target.
What is to prevent this code from analyzing the sequence:
... fallthrough to some mystery block B3 (FBB == nullptr) ...
and returning this?
BranchCond = COND_E_AND_NP
TBB = B2
FBB = nullptr
The branch to B1 is lost.
Comment at: llvm/trunk/lib/Target/X86/X86InstrInfo.h:57
@@ +56,3 @@
> DavidKreitzer wrote:
> > The COND_NP_OR_E condition is rather pointless. Based on the comment, it is pretty clear that the intent was for this to be an artificial condition for FCMP_OEQ, but that should be COND_NP_AND_E. (AND not OR)
> > In other words, I'd recommend deleting the existing COND_NP_OR_E condition and the COND_P_AND_NE one that you added. COND_NE_OR_P and its inverse COND_E_AND_NP are sufficient.
> > [FWIW, COND_NP_OR_E would always evaluate to true assuming the CC's originated from an FP compare instruction like COMISS.]
> This makes sense. Done.
I would recommend combining these two comments. As written, they are a little misleading, because they suggest that COND_NE_OR_P is used to implement both FCMP_OEQ & FCMP_UNE while COND_E_AND_NP is used just to negate COND_NE_OR_P. In fact, COND_NE_OR_P is the natural implementation of FCMP_UNE while COND_E_AND_NP is the natural implementation of FCMP_OEQ. How about something like this?
// Artificial condition codes. These are used by AnalyzeBranch
// to indicate a block terminated with two conditional branches that together
// form a compound condition. They occur in code using FCMP_OEQ or FCMP_UNE,
// which can't be represented on x86 with a single condition. These
// are never used in MachineInstrs and are inverses of one another.
Comment at: llvm/trunk/test/CodeGen/X86/fp-une-cmp.ll:51
@@ +50,3 @@
+; CHECK: %bb1
+; CHECK: jmp
> davidxl wrote:
> > congh wrote:
> > > DavidKreitzer wrote:
> > > > If we invert the compound branch at the end of the entry block and place bb1 before bb2, we can eliminate the jmp at the end of bb1. Do you know why that isn't happening?
> > > >
> > > This is because bb2 is hotter than bb1 (note that there is a branch_weights profile data), and the BlockPlacement pass will place the hotter one as a fall-through.
> > The test case is explicitly added to test the ability for MachineBlockPlacement to break the topological order and reorder BB2 ahead of BB1 (BB1 is ahead of BB2 in source order) -- look at the profile annotation that BB1 is a really cold block -- this reordering is not possible without this patch.
> > See also discussions in http://reviews.llvm.org/D11393?vs=on&id=45764&whitespace=ignore-most#toc
> Thanks for the explanation! I missed the profile annotation - Please ignore my comment on this one.
Understood about the branch weights, thanks, and thanks for adding the comment to make that clearer.
It's worth noting that bb2 is placed ahead of bb1 even under minsize. That indicates to me that something may need to be tweaked in block placement, but I think that's beyond the scope of this change set.
More information about the llvm-commits