[llvm] r183174 - Fix a defect in code-layout pass, improving Benchmarks/Olden/em3d/em3d by about 30%

Chandler Carruth chandlerc at google.com
Tue Jun 4 00:46:40 PDT 2013


On Mon, Jun 3, 2013 at 6:00 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:

> Author: shuxin_yang
> Date: Mon Jun  3 20:00:57 2013
> New Revision: 183174
>
> URL: http://llvm.org/viewvc/llvm-project?rev=183174&view=rev
> Log:
> Fix a defect in code-layout pass, improving Benchmarks/Olden/em3d/em3d by
> about 30%
> (4.58s vs 3.2s on an oldish Mac Tower).
>
>   The corresponding src is excerpted bellow. The lopp accounts for about
> 90% of execution time.
>   --------------------
>     cat -n test-suite/MultiSource/Benchmarks/Olden/em3d/make_graph.c
>      90
>      91         for (k=0; k<j; k++)
>      92           if (other_node == cur_node->to_nodes[k]) break;
>
>   The defective layout is sketched bellow, where the two branches need to
> swap.
>   ------------------------------------------------------------------------
>       L:
>          ...
>       if (cond) goto out-of-loop
>       goto L
>
>   While this code sequence is defective, I don't understand why it incurs
> 1/3 of
> execution time. CPU-event-profiling indicates the poor laoyout dose not
> increase
> in br-misprediction; it dosen't increase stall cycle at all, and it dosen't
> prevent the CPU detect the loop (i.e. Loop-Stream-Detector seems to be
> working fine
> as well)...
>

Welcome to the fun world of block placement. =] I spent about 4 weeks on
the same loop because of this. The final conclusion we came to is that we
simply don't have all the data on the hardware, and something else about
this loop breaks it. =/ The weird thing is, I fixed this same bug at least
once before...


>
>    The root cause of the problem is that the layout pass calls
> AnalyzeBranch()
> with basic-block which is not updated to reflect its current layout.
>

Then why isn't the fix to update it? See my comments below. I think you're
on the right track here, but I'm pretty convinced that the solution you
have here can't be the right solution.

Also, you *must* provide test cases for this kind of change. It is far too
subtle otherwise, and will just break in the future.


>
> rdar://13966341
>
> Modified:
>     llvm/trunk/lib/CodeGen/MachineBlockPlacement.cpp
>
> Modified: llvm/trunk/lib/CodeGen/MachineBlockPlacement.cpp
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/MachineBlockPlacement.cpp?rev=183174&r1=183173&r2=183174&view=diff
>
> ==============================================================================
> --- llvm/trunk/lib/CodeGen/MachineBlockPlacement.cpp (original)
> +++ llvm/trunk/lib/CodeGen/MachineBlockPlacement.cpp Mon Jun  3 20:00:57
> 2013
> @@ -991,6 +991,28 @@ void MachineBlockPlacement::buildCFGChai
>      Cond.clear();
>      MachineBasicBlock *TBB = 0, *FBB = 0; // For AnalyzeBranch.
>      if (!TII->AnalyzeBranch(*PrevBB, TBB, FBB, Cond)) {
> +      // The "PrevBB" is not yet updated to reflect current code layout,
> so,
> +      //   o. it may fall-through to a block without explict "goto"
> instruction
> +      //      before layout, and no longer fall-through it after layout;
> or
> +      //   o. just opposite.
>

Yes, and my memory of AnalyzeBranch from when I wrote this code is that it
explicitly had logic (at least for x86) to handle these cases. We need it
to handle them because updateTerminator (the routine which fixes the
terminator to not have the broken properties you observe in PrevBB) can
only be called when the branch is analyzable. See the FIXME comment just a
few lines up.


> +      //
> +      // AnalyzeBranch() may return erroneous value for FBB when these two
> +      // situations take place. For the first scenario FBB is mistakenly
> set
> +      // NULL; for the 2nd scenario, the FBB, which is expected to be
> NULL,
> +      // is mistakenly pointing to "*BI".
>

This is either a bug in AnalyzeBranch, or a bug in us calling
AnalyzeBranch. I suspect it is a bug in one target's implementation of
AnalyzeBranch, but the API and behavior for that routine is so *incredibly*
convoluted and poorly specified that I could believe either one. However,
the fix should not be here! Instead we should fix updateTerminator and
AnalyzeBranch to work in a more sane way.

Let me see if I understand the problem correctly:


> +      //
> +      bool needUpdateBr = true;
> +      if (!Cond.empty() && (!FBB || FBB == *BI)) {
> +        PrevBB->updateTerminator();
>

OK, so the goal is that:

1) If the branch is analyzable, but
2) We get garbage back from the analysis, then
3) We update the terminators, and
4) Re-analyze the branch in order to actually make the terminator reversal
decision correctly, and then
5) Re-update the terminator?

If I've understood the problem correctly, then I think there are two
possible correct solutions, but both look somewhat different from this:

A) We could solve in essentially the same way you have here, but more
cleanly. This would be done by teaching updateTerminator to work even in
the cases where AnalyzeBranch actually fails, so that rather than
asserting, it simply bails out. With this change, we could write much
simpler code:

PrevBB->updateTerminator();

if (TII->AnalyzeBranch(...)) {
  if (/* existing code to check for a profitable reversal case */) {
    // reverse the branch and condition
    PrevBB->updateTerminator(); // because we monkeyed with it
  }
}

Does that make sense? It seems much more clear than the current code
sequence.


B) An alternative solution, which I think I would prefer over (A), is to
fix the bug in AnalyzeBranch that causes it to mis-analyze the branch in
this case. There doesn't seem to be any reason it can't do the essential
work you're describing here or the work that updateTerminators does to
compute the correct TBB and FBB information.


C) My favorite solution, which isn't strictly necessary to solve this
problem, is to do (B), and then to sink all of this logic for reversing
terminators into updateTerminators() and make it do this automatically,
factoring it completely out of MachineBlockPlacement.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130604/621cda0c/attachment.html>


More information about the llvm-commits mailing list