[LLVMdev] Loss of precision with very large branch weights
Duncan P. N. Exon Smith
dexonsmith at apple.com
Fri Apr 24 13:08:06 PDT 2015
> On 2015-Apr-24, at 12:51, Xinliang David Li <davidxl at google.com> wrote:
>
> On Fri, Apr 24, 2015 at 12:40 PM, Duncan P. N. Exon Smith
> <dexonsmith at apple.com> wrote:
>>
>>> On 2015-Apr-24, at 11:18, Diego Novillo <dnovillo at google.com> wrote:
>>>
>>>
>>> In PR 22718, we are looking at issues with long running applications producing non-representative frequencies. For example, in these two loops:
>>>
>>> int g = 0;
>>> __attribute__((noinline)) void bar() {
>>> g++;
>>> }
>>>
>>> extern int printf(const char*, ...);
>>>
>>> int main()
>>> {
>>> int i, j, k;
>>>
>>> for (i = 0; i < 1000000; i++)
>>> bar();
>>>
>>> printf ("g = %d\n", g);
>>> g = 0;
>>>
>>> for (i = 0; i < 500000; i++)
>>> bar();
>>>
>>> printf ("g = %d\n", g);
>>> }
>>>
>>> I expect the loop body frequency for the second loop to be about half of the first one. This holds fine for this test case:
>>>
>>> $ bin/opt -analyze -block-freq -S unbiased-branches.ll
>>> Printing analysis 'Block Frequency Analysis' for function 'bar':
>>> block-frequency-info: bar
>>> - entry: float = 1.0, int = 8
>>>
>>> Printing analysis 'Block Frequency Analysis' for function 'main':
>>> block-frequency-info: main
>>> - entry: float = 1.0, int = 8
>>> - for.cond: float = 500001.5, int = 4000011
>>> - for.body: float = 500000.5, int = 4000003
>>> - for.inc: float = 500000.5, int = 4000003
>>> - for.end: float = 1.0, int = 8
>>> - for.cond1: float = 250001.5, int = 2000011
>>> - for.body3: float = 250000.5, int = 2000003
>>> - for.inc4: float = 250000.5, int = 2000003
>>> - for.end6: float = 1.0, int = 8
>>>
>>>
>>> But if I manually modify the frequencies of both to get close to MAX_INT32, the ratios between the frequencies do not reflect reality. For example, if I change branch_weights in both loops to be 4,294,967,295 and 2,147,483,647
>>
>> What does -analyze -branch-prob give?
>>
>>> $ bin/opt -analyze -block-freq -S unbiased-branches.ll
>>> Printing analysis 'Block Frequency Analysis' for function 'bar':
>>> block-frequency-info: bar
>>> - entry: float = 1.0, int = 8
>>>
>>> Printing analysis 'Block Frequency Analysis' for function 'main':
>>> block-frequency-info: main
>>> - entry: float = 1.0, int = 8
>>> - for.cond: float = 1073741824.4, int = 8589934595
>>> - for.body: float = 1073741823.4, int = 8589934587
>>> - for.inc: float = 1073741823.4, int = 8589934587
>>> - for.end: float = 1.0, int = 8
>>> - for.cond1: float = 1073741824.4, int = 8589934595
>>> - for.body3: float = 1073741823.4, int = 8589934587
>>> - for.inc4: float = 1073741823.4, int = 8589934587
>>> - for.end6: float = 1.0, int = 8
>>>
>>> Now both loops are considered equally hot.
>>>
>>> Duncan, I think that if I were to make branch_weights a 64-bit integer, this would not be an issue. But I'm not sure if I'm not missing something else here.
>>
>> I don't understand where the fidelity is being lost; this seems
>> strange to me. 4B fits inside UINT32_MAX, so I don't see how we
>> hit a cap here.
>
> The max weight is UINT32_MAX/BB->GetTerminator()->GetNumSuccessors().
>
>
>>
>> But certainly there *is* a limit. We can't store a ratio of larger
>> than ~4B:1. Is this really a problem?
>
> Right, this is not the issue. The issue is that when one branch's
> weight reaches the cap, the other branch's weight is not scaled
> accordingly:
>
> uint32_t WeightLimit = getMaxWeightFor(BB);
> Weights.reserve(TI->getNumSuccessors());
> for (unsigned i = 1, e = WeightsNode->getNumOperands(); i != e; ++i) {
> ConstantInt *Weight =
> mdconst::dyn_extract<ConstantInt>(WeightsNode->getOperand(i));
> if (!Weight)
> return false;
> Weights.push_back(
> std::max<uint32_t>(1, Weight->getLimitedValue(WeightLimit)));
> <---- may hit limit
Ah, right, this looks like a bug.
>
>
>> I understand that you can't
>> tell which loop is hotter, but why does that matter? Is there some
>> optimization that will do the wrong thing? Which one?
>>
>
> Two calls (in the loop) not equally hot may be considered equally hot
> -- confusing the cost/benefit analysis. This is just an example.
>
> Our goal is to make sure profile data is faithfully kept. Which pass
> or whether it will be used now or in the future is a totally different
> matter.
>
>
>> Regarding changing branch weights to 64-bits, I think we should be
>> sure it's valuable before we change it. I imagine it's doable, but
>> it'll complicate logic in a few places that don't currently have to
>> worry about overflowing `uint64_t`.
>
> I think we can fix this without using 64bit weight.
>
> David
More information about the llvm-dev
mailing list