[LLVMdev] [RFC] BlockFrequency is the wrong metric; we need a new one

Duncan P. N. Exon Smith dexonsmith at apple.com
Tue Mar 25 17:22:15 PDT 2014


Hi Andy,

Thanks for your review.  I have some responses and follow-up questions.

On Mar 19, 2014, at 12:34 AM, Andrew Trick <atrick at apple.com> wrote:

> Originally, we did not want to make BlockFrequency depend on LoopInfo, but I no longer see a good reason to avoid it. On the other hand, LoopInfo is not a helpful representation for you. The only drawback to rolling your own is the little LoopStack utility. It's current position in the file is confusing to me because it appears to be part of the main algorithm. If you comment that it is a helper for initializeLoops, and place it adjacent in the code, then it's probably simpler than trying to reuse LoopInfo--unless you have an issue with irreducible CFG.
> 
> LoopInfo effectively ignores backedges that don't target a natural loop header. Whereas it looks like LoopStack will attempt to create a loop for every backedge. You will end up visiting blocks for different "loops" before finishing either loop. Have you reasoned through this and determined how it will play out? Does the comment "frequencies can be distorted past the end of the loop” still completely characterize the problem?

I did reason through this, and some slightly weird things happen.  I
decided that the weird things are *less* weird than artifacts from
ignoring such back edges entirely, but this was somewhat arbitrary.

There are some other problems with my loop detection.  I was running
through some examples to confirm my answer to your question below, I
noticed a significant regression that must have crept in while I was
reorganizing code for review.

Rolling my own solution here was dangerous because it’s really hard to
test (and, thus, to prevent regressions).  I think the right solution is
to use LoopInfo; as unhelpful as it is, the complex part (loop detection)
is already well-tested.

If the overhead is too high, that’s motivation to fix LoopInfo.

I.e., I plan to migrate to LoopInfo before turning this on.  Does that
sound fine to you?

> ---
> 
> How does loop mass distribution work when an inner loop exits to an outer loop more than one level up? Is the exit recorded in multiple loop packages? I fail to see how the mass will be propagated in the outermost loop, and how the scale will be computed in the middle loop.

It’s an exit from the inner loop, so it’ll be recorded in the inner loop
package.  The inner loop package (or pseudo-node) will have an exit out
of the middle loop, too, so it’ll get recorded again in the middle loop
package.

The middle loop will have two exits that contribute to its exit
probability:  the inner loop pseudo-node, and the middle loop condition.
The scale is just the inverse of its exit probability.

Consider this example:

    while (outer()) {
        while (middle()) {
            while (inner()) {
                if (breaktwice()) {
                    goto outer; // if.then
                }
            }
        }
    outer: {}
    }

Note:  there’s something strange here.  The if.then block (which contains
the goto) is *not* in the inner loop; it’s outside of the middle loop.
Optimizations should merge this block with its successor (outer); for
simplicity, I’m going to assume that’s already happened.

Let’s say all three loop conditions are false with 1/5 probability, and
breaktwice() is true with probability 1/4.  The inner loop will have exit
probability 2/5 (1/5 + 4/5*1/4), and will get a loop scale of 5/2.  When
it’s packaged, it will have two outgoing edges of equal weight.

The middle loop will have two exits: from the condition with probability
1/5, and from the inner loop with probability 1/2.  The exit probability
will be 3/5 (1/5 + 4/5*1/2), so the scale will be 5/3.  The middle loop
will have one outgoing edge to outer.  (If we don’t assume that if.then
has been merged with outer, then the middle loop will actually have two
exits: one to outer, and one to if.then (whose sole successor is outer).)

Finally, the outer loop has just one exit, with probability 1/5.  The
pseudo-node for the middle loop will distribute all of its mass to outer. 
The outer loop scale is 5/1.

Assuming the outer loop’s sole predecessor has frequency of 1:

  - the outer condition will have frequency of 5 (5*1),
  - the middle condition will have frequency of 20/3 (5*5/3*4/5),
  - the inner condition will have frequency of 40/3 (20/3*5/2*4/5),
  - the if condition will have frequency 32/3 (40/3*4/5), and
  - outer will have frequency 4 (5*4/5).

> ---
> 
> How expensive is calculateBias? Should it be done lazily?

It’s a floating point divide, so it’s relatively expensive.  Lazy might
be better.

> ---
> 
> I know you implemented some form of verification, but I don't see it. Anyone else touching the code needs to at least know how to do this, if it's not already part of the IR/MI verifier.

The only verification I implemented is the lit tests.

What else do you have in mind here?



More information about the llvm-dev mailing list