[LLVMdev] Loss of precision with very large branch weights

Fri Apr 24 13:08:06 PDT 2015

> On 2015-Apr-24, at 12:51, Xinliang David Li <davidxl at google.com> wrote:
> 
> On Fri, Apr 24, 2015 at 12:40 PM, Duncan P. N. Exon Smith
> <dexonsmith at apple.com> wrote:
>> 
>>> On 2015-Apr-24, at 11:18, Diego Novillo <dnovillo at google.com> wrote:
>>> 
>>> 
>>> In PR 22718, we are looking at issues with long running applications producing non-representative frequencies.  For example, in these two loops:
>>> 
>>> int g = 0;
>>> __attribute__((noinline)) void bar() {
>>> g++;
>>> }
>>> 
>>> extern int printf(const char*, ...);
>>> 
>>> int main()
>>> {
>>>  int i, j, k;
>>> 
>>>  for (i = 0; i < 1000000; i++)
>>>    bar();
>>> 
>>>  printf ("g = %d\n", g);
>>>  g = 0;
>>> 
>>>  for (i = 0; i < 500000; i++)
>>>    bar();
>>> 
>>>  printf ("g = %d\n", g);
>>> }
>>> 
>>> I expect the loop body frequency for the second loop to be about half of the first one. This holds fine for this test case:
>>> 
>>> $ bin/opt -analyze -block-freq -S unbiased-branches.ll
>>> Printing analysis 'Block Frequency Analysis' for function 'bar':
>>> block-frequency-info: bar
>>> - entry: float = 1.0, int = 8
>>> 
>>> Printing analysis 'Block Frequency Analysis' for function 'main':
>>> block-frequency-info: main
>>> - entry: float = 1.0, int = 8
>>> - for.cond: float = 500001.5, int = 4000011
>>> - for.body: float = 500000.5, int = 4000003
>>> - for.inc: float = 500000.5, int = 4000003
>>> - for.end: float = 1.0, int = 8
>>> - for.cond1: float = 250001.5, int = 2000011
>>> - for.body3: float = 250000.5, int = 2000003
>>> - for.inc4: float = 250000.5, int = 2000003
>>> - for.end6: float = 1.0, int = 8
>>> 
>>> 
>>> But if I manually modify the frequencies of both to get close to MAX_INT32, the ratios between the frequencies do not reflect reality. For example, if I change branch_weights in both loops to be 4,294,967,295 and 2,147,483,647
>> 
>> What does -analyze -branch-prob give?
>> 
>>> $ bin/opt -analyze -block-freq -S unbiased-branches.ll
>>> Printing analysis 'Block Frequency Analysis' for function 'bar':
>>> block-frequency-info: bar
>>> - entry: float = 1.0, int = 8
>>> 
>>> Printing analysis 'Block Frequency Analysis' for function 'main':
>>> block-frequency-info: main
>>> - entry: float = 1.0, int = 8
>>> - for.cond: float = 1073741824.4, int = 8589934595
>>> - for.body: float = 1073741823.4, int = 8589934587
>>> - for.inc: float = 1073741823.4, int = 8589934587
>>> - for.end: float = 1.0, int = 8
>>> - for.cond1: float = 1073741824.4, int = 8589934595
>>> - for.body3: float = 1073741823.4, int = 8589934587
>>> - for.inc4: float = 1073741823.4, int = 8589934587
>>> - for.end6: float = 1.0, int = 8
>>> 
>>> Now both loops are considered equally hot.
>>> 
>>> Duncan, I think that if I were to make branch_weights a 64-bit integer, this would not be an issue. But I'm not sure if I'm not missing something else here.
>> 
>> I don't understand where the fidelity is being lost; this seems
>> strange to me.  4B fits inside UINT32_MAX, so I don't see how we
>> hit a cap here.
> 
> The max weight is UINT32_MAX/BB->GetTerminator()->GetNumSuccessors().
> 
> 
>> 
>> But certainly there *is* a limit.  We can't store a ratio of larger
>> than ~4B:1.  Is this really a problem?
> 
> Right, this is not the issue. The issue is that when one branch's
> weight reaches the cap, the other branch's weight is not scaled
> accordingly:
> 
> uint32_t WeightLimit = getMaxWeightFor(BB);
> Weights.reserve(TI->getNumSuccessors());
>  for (unsigned i = 1, e = WeightsNode->getNumOperands(); i != e; ++i) {
>    ConstantInt *Weight =
>        mdconst::dyn_extract<ConstantInt>(WeightsNode->getOperand(i));
>    if (!Weight)
>      return false;
>    Weights.push_back(
>      std::max<uint32_t>(1, Weight->getLimitedValue(WeightLimit)));
> <---- may hit limit

Ah, right, this looks like a bug.

> 
> 
>> I understand that you can't
>> tell which loop is hotter, but why does that matter?  Is there some
>> optimization that will do the wrong thing?  Which one?
>> 
> 
> Two calls (in the loop) not equally hot may be considered equally hot
> -- confusing the cost/benefit analysis. This is just an example.
> 
> Our goal is to make sure profile data is faithfully kept. Which pass
> or whether it will be used now or in the future is a totally different
> matter.
> 
> 
>> Regarding changing branch weights to 64-bits, I think we should be
>> sure it's valuable before we change it.  I imagine it's doable, but
>> it'll complicate logic in a few places that don't currently have to
>> worry about overflowing `uint64_t`.
> 
> I think we can fix this without using 64bit weight.
> 
> David