[PATCH] blockfreq: Rewrite block frequency analysis

Tue Mar 25 17:57:31 PDT 2014

On Mar 25, 2014, at 2:06 PM, Chandler Carruth <chandlerc at google.com> wrote:

> 
> On Wed, Mar 19, 2014 at 1:45 PM, Chandler Carruth <chandlerc at google.com> wrote:
>  If so, does it
> makes sense for me to post the current, massive, combined patch of
> 0003+0004+0005+0006(+0007?) now, so you can look at the algorithm as a whole?
> 
> I'm actually interested in evaluating this totally independent of the bias part. I'm in complete agreement that the flow-solve aspect is orthogonal to what precise metric is useful for triggering different optimizations.
> 
> I've had some time to both read the patch and think about the specific algorithm. A few high-level comments.
> 
> 1) I'm pretty sure you can use LoopInfo. It won't be ideal and we could probably make LoopInfo much better for your use case, but in particular the problem of distinguishing between immediate loop blocks and nested loop blocks seems reasonably easy to manage when working bottom-up -- you can keep a set of the nested blocks built at each stage of the up-ward walk.
> 
> 2) Outside of the loop structure detection and management, how important are the RPO-traversal bits? Was it just convenient or really important to propagate weights in this way? This could be the real limitation of using LoopInfo -- it doesn't really preserve the RPO structures the way your code does.

The RPO-traversal is important for distributing mass (you need everything to
arrive at a node before you know how much to distribute to a successor).

Nevertheless, I’m also starting to feel skeptical about rolling my own,
particularly because it’s complicated code that’s somewhat orthogonal to what
I’m doing, so it really needs its own set of tests.

I’d like to commit it as is, and then try to reuse LoopInfo before turning the
new block frequency on.

> 3) I'm increasingly in favor of just using power-of-two loop scales. I actually can't come up with any use cases where distinguishing between 3 and 4 as the "likely" loop trip count would matter. The only case I can see would be that rounding 3 down to 2 could have a bad effect in 3-iteration-heavy code such as graphics code, but it seems simpler to fix that directly by rounding 3 explicitly up to 4…

Andy was worried about numbers up to 8.  It should be straightforward to
hardcode the boundaries for the scales 3, 5, 6, and 7, and then use power-of-
two scales for the rest.

The code will get even harder to look at, though.  To prevent the precision
issues I was hitting originally, I’ll have to do some floating point-like
stuff.  I’m fine either way, but it will be even harder for someone else to
come and figure out what’s going on.

Andy, what do you think?  You seemed convinced in your review that explicitly
using a float type here would be better.  Are you still?

> 4) I'm having trouble with the mixture of terminology between mass, weight, and frequency. Do you have a mental model for the terminology you can add to the documentation? (Or did I miss it?)

I do have a mental model; some of it was in the docs, but not all.  I’ll fix
that.

> I'm also still concerned about exposing both mass and frequency in the public API. What is the plan there?

BlockMass was only exposed so that I could test BlockMass*BranchProbability.  I
think the right solution is to move that into BranchProbability*uint64_t and test
it there.

I.e., I don’t think it should be exposed.

> As a somewhat separate note, I'm curious if you looked into directly mapping this problem into mininum-cost flow network solutions along the lines of this thesis? http://www.cs.technion.ac.il/~royl/MscThesis_Final_Version_Submission.pdf

I thought about network flows briefly, but didn’t get as far as looking at
recent research.  -block-freq-v3?