[PATCH] blockfreq: Rewrite block frequency analysis

Fri Apr 4 22:20:46 PDT 2014

On Apr 4, 2014, at 9:26 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:

> 
> On 2014 Apr 3, at 14:11, Andrew Trick <atrick at apple.com> wrote:
> 
>> On Mar 26, 2014, at 5:43 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:
>> 
>>> On Mar 25, 2014, at 7:23 PM, Chandler Carruth <chandlerc at google.com> wrote:
>>> 
>>>> Ok. I think having the terminology clarified will also help me understand the code.
>> 
>> One more thing. Chandler requested definitions of the terminology, which would be good to do before your first commit.
> 
> I've hit a roadblock with the commit (see my review of r204690), but
> here's some terminology nevertheless.  If there's somewhere appropriate
> to commit this, let me know, but I figure it would just rot.

This is very nice. It’s appropriate in the LLVM subsystem docs. You can cross reference LLVM Branch Weight Metadata.

-Andy

> Terminology
> ===========
> 
> Here's an overview of the terminology currently in use for
> BlockFrequencyInfo (BFI).
> 
>  - The three terms currently exposed in the API: *branch probability*,
>    *edge weight*, and *block frequency*.
> 
>  - Brief context about the implementation.
> 
>  - The two additional terms I use in my upcoming commit: *block mass*
>    and *loop scale*.
> 
>  - A high level description of how block mass and loop scale combine to
>    produce a block frequency.
> 
>  - The final term proposed in the RFC, which will also be exposed in
>    the API: *block bias*.
> 
> Branch Probability
> ------------------
> 
> Blocks with multiple successors have probabilities associated with each
> outgoing edge.  These are called branch probabilities.  For a given
> block, the sum of its outgoing branch probabilities should be 1.0.
> 
> Edge Weight
> -----------
> 
> Rather than storing fractions on each edge, we store an integer edge
> weight.  Edge weights are relative to the other edges of a given
> predecessor block.  The branch probability associated with a given edge
> is its weight divided by the sum of the predecessor's outgoing edge
> weights.
> 
> For example, consider this IR:
> 
>    define void @foo() {
>        ; ...
>        A:
>            br i1 %cond, label %B, label %C, !prof !0
>        ; ...
>    }
>    !0 = metadata !{metadata !"branch_weights", i32 7, i32 8}
> 
> and this simple graph representation:
> 
>    A -> B  (edge-weight: 7)
>    A -> C  (edge-weight: 8)
> 
> The probability of branching from block A to block B is 7/15, and the
> probability of branching from block A to block C is 8/15.
> 
> Block Frequency
> ---------------
> 
> Block frequency is a relative metric that represents the number of times
> a block executes.  The ratio of a block frequency to the entry block
> frequency is the expected number of times the block will execute per
> entry to the function.
> 
> Block frequency is the main output of the BFI pass.
> 
> Implementation: a series of DAGs
> --------------------------------
> 
> The implementation of the block frequency calculation analyses each
> loop, bottom-up, ignoring backedges; i.e., as a DAG.  After each loop is
> processed, it's packaged up to act as a pseudo-node in its parent loop's
> (or the function's) DAG analysis.
> 
> Block Mass
> ----------
> 
> For each DAG, the entry node is assigned a mass of `UINT64_MAX`
> (representing 1.0 in a kind of fixed-point arithmetic) and mass is
> distributed to successors according to branch probabilities.
> 
> After mass is fully distributed, in any cut of the DAG that separates
> the exit nodes from the entry node, the sum of the block masses of the
> nodes succeeded by a cut edge should equal `UINT64_MAX`.  In other
> words, mass is conserved as it "falls" through the DAG.
> 
> If a function's basic block graph is a DAG, then block masses can (in
> theory) be used directly as block frequencies.  (In practise, this
> doesn't currently work, since downstream users aren't careful about
> overflow.)
> 
> Loop Scale
> ----------
> 
> Loop scale is a metric that indicates how many times a loop iterates per
> entry.  As mass is distributed through the loop's DAG, the (otherwise
> ignored) backedge mass is collected.  This backedge mass is used to
> compute the exit frequency, and thus the loop scale.
> 
> Getting from mass and scale to frequency
> ----------------------------------------
> 
> After analysing the complete series of DAGs, each block has a mass
> (local to its containing loop, if any), and each loop psuedo-node has a
> loop scale and its own mass (from its parent's DAG).
> 
> We can get an initial frequency assignment (with entry frequency of 1.0)
> by multiplying these masses and loop scales together.  A given block's
> frequency is the product of its mass, the mass of containing loops'
> pseudo nodes, and the containing loops' loop scales.
> 
> Since downstream users need integers (not floating point), this initial
> frequency assignment is shifted as necessary into the range of
> `uint64_t`.
> 
> Block Bias
> ----------
> 
> Block bias is a proposed *absolute* metric to indicates a bias toward or
> away from a given block during a function's execution.  The idea is that
> bias can be used in isolation to indicate whether a block is relatively
> hot or cold, or to compare two blocks to indicate whether one is hotter
> or colder than the other.
> 
> The proposed calculation involves calculating a *reference* block
> frequency, where:
> 
>  - every branch weight is assumed to be 1 (i.e., every branch
>    probability distribution is even) and
> 
>  - loop scales are ignored.
> 
> This reference frequency represents what the block frequency would be in
> an unbiased graph.
> 
> The bias is the ratio of the block frequency to this reference block
> frequency.
>