[llvm] r184590 - Revert "BlockFrequency: Saturate at 1 instead of 0 when multiplying a frequency with a branch probability."

Tue Jun 25 11:23:52 PDT 2013

On Jun 25, 2013, at 6:39 AM, Benjamin Kramer <benny.kra at gmail.com> wrote:

> 
> On 25.06.2013, at 01:36, Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote:
> 
>> 
>> On Jun 24, 2013, at 3:38 PM, Chandler Carruth <chandlerc at google.com> wrote:
>> 
>>> 
>>> On Mon, Jun 24, 2013 at 3:29 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote:
>>> I am proposing that we use a number representation with non-sticky saturation in both ends of the representable range. We already have that in the high end: Calculations saturate at UINT64_MAX, but unlike IEEE infinity, they don’t stay saturated when multiplied by fractions < 1.
>>> 
>>> We need the same behavior at the low end: Saturate to 1 instead of 0. This limit is non-sticky so multiplying it by a fraction > 1 can still give you a larger number.
>>> 
>>> I think the problem is that we end up saturating at 1 *a lot* in highly branchy situations (think large switches, or nested large switches), and then the summing behavior that happens at merge points introduces really problematic artifacts. Whatever scheme we end up with, the key (to me) is making the merge points not introduce distortion into the frequency graph.
>> 
>> The problem is ideally suited for a floating point representation because we need such a big dynamic range, but I am afraid that a soft floating point implementation will be too expensive.
>> 
>> There is a couple of things we can do with the i64 representation:
>> 
>> - Shift the dynamic range. Currently it is approx 10^-3..10^18. Maybe moving it a decade or three is all we need.
> 
> To get rid of the immediate regressions I bumped up the entry frequency by 4 powers of two. It's a compromise; we should have a lot of room at the high end but some details of BlockFrequencyInfo limit the value to 32 bits (divBlockFreq) so we hit saturation a lot earlier than we should with a full 64 bit value.

This implementation is incomplete:

  void divBlockFreq(BlockT *BB, BranchProbability Prob) {
    uint64_t N = Prob.getNumerator();
    assert(N && "Illegal division by zero!");
    uint64_t D = Prob.getDenominator();
    uint64_t Freq = (Freqs[BB].getFrequency() * D) / N;

    // Should we assert it?
    if (Freq > UINT32_MAX)
      Freq = UINT32_MAX;

    Freqs[BB] = BlockFrequency(Freq);

There shouldn’t be any need to saturate frequencies at UINT32_MAX, this basically makes the high bits of the i64 defunct. It looks like leftovers form an earlier implementation where frequencies were i32.

This function should compute F*D/N with the normal saturation at UINT64_MAX. The hard part is handling the case where F*D overflows, but F*D/N doesn’t. BlockFrequency.cpp has code to do a 96-bit division.

/jakob