[llvm-dev] Computing block profile weights

Jonas Wagner via llvm-dev llvm-dev at lists.llvm.org
Fri Dec 2 01:40:02 PST 2016


Hello,

I'm working on an application that would benefit from knowing the weight of
a basic block, as in "fraction of the program's execution time spent in
this block".

Currently, I'm computing this using the block's frequency from
BlockFrequencyInfo, relative to the function's entry block frequency, and
scaled by the function's entry count. This is also the computation that's
done by getBlockProfileCount
<https://github.com/llvm-mirror/llvm/blob/master/lib/Analysis/BlockFrequencyInfoImpl.cpp#L540>
in
lib/Analysis/BlockFrequencyInfoImpl.cpp.

The problem is that this method can be extremely imprecise, because many
functions have an entry count of zero. The entry count is computed from the
number of profile samples in the entry block. Depending on the function's
CFG, this count can be arbitrarily low even though the function is
frequently called or hot.

Here's an idea to address this. I'd like to collect a bit of feedback from
the community before trying it out.

1) Instead of relying on a function's *entry count*, store the *total
number of samples* in a function. This number is readily available from the
profile loader.
2) Compute a block's weight as `function_samples * block_weight /
sum_of_block_weights_in_function`

Why do I like this?

- Total samples in a function gives a good impression of the importance of
a function, better than the entry count.
- This scheme "preserves mass" in that all samples of a function are taken
into account. The samples in a BB are compared to samples in the entire
function, rather than a few (arbitrarily) selected samples from the entry
block.
- The computation avoids imprecision from multiplying by small numbers.

Disadvantages?

- BlockFrequencyInfo needs to keep track of the total frequency in a
function.
- BlockFrequencyInfo would probably scale the frequencies w.r.t. that
total, rather then the maximum frequency. This loses a few bits of precision

Note that the entry count would not be lost in this scheme; one could
easily compute it as `function_samples * entry_weight /
sum_of_block_weights_in_function`.

I believe using an entire function as unit of reference is a good
compromise between precision and modularity. Precision is high because
there's a sufficient number of samples available in a function. Modularity
is preserved because the computation does not need to take other functions
into account (in fact, BlockFrequencyInfo already processes one function at
at time).

What do people think about this?
- Jonas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161202/902cf3ae/attachment.html>


More information about the llvm-dev mailing list