[LLVMdev] RFC - Improvements to PGO profile support
Philip Reames
listmail at philipreames.com
Tue Mar 24 13:54:43 PDT 2015
On 03/24/2015 10:34 AM, Xinliang David Li wrote:
>> I follow what you're trying to say. :)
>>
>> Also, trusting exact entry counts is going to be somewhat suspect. These
>> are *highly* susceptible to racy updates, overflow, etc... Anything which
>> puts too much implicit trust in these numbers is going to be problematic.
> No, we won't be using profile counts in the absolute sense, a.k.a. if
> (count == 100000). The counts is used as global hotness relative to
> others. It is also used together with program summary data.
>
>> I have no objection to adding a mechanism for expressing an entry count. I
>> am still very hesitant about the proposals with regards to redefining the
>> current MD_prof.
> For now, we won't touch MD_prof's definition.
>
>> I'd encourage you to post a patch for the entry count mechanism, but not tie
>> its semantics to exact execution count. (Something like "the value provided
>> must correctly describe the relative hotness of this routine against others
>> in the program annoatated with the same metadata. It is the relative
>> scaling that is important, not the absolute value. In particular, the value
>> need not be an exact execution count.")
>>
> Define entry count in a more flexible way is fine -- as long as the
> implementation can choose to use 'execution count' :)
Absolutely agreed here. :)
>
> thanks,
>
> David
>
>>> Many of the other issues you raise seem like they could also be addressed
>>> without major changes to the existing infrastructure. Let’s try to fix those
>>> first.
>>
>> That's exactly the point of the proposal. We definitely don't want to make
>> major changes to the infrastructure at first. My thinking is to start
>> working on making MD_prof a real count. One of the things that are happening
>> is that the combination of real profile data plus the frequency propagation
>> that we are currently doing is misleading the analysis.
>>
>> I consider this a major change. You're trying to redefine a major part of
>> the current system.
>>
>> Multiple people have spoken up and objected to this change (as currently
>> described). Please start somewhere else.
>>
>>
>> For example (thanks David for the code and data). In the following code:
>>
>> int g;
>> __attribute__((noinline)) void bar() {
>> g++;
>> }
>>
>> extern int printf(const char*, ...);
>>
>> int main()
>> {
>> int i, j, k;
>>
>> g = 0;
>>
>> // Loop 1.
>> for (i = 0; i < 100; i++)
>> for (j = 0; j < 100; j++)
>> for (k = 0; k < 100; k++)
>> bar();
>>
>> printf ("g = %d\n", g);
>> g = 0;
>>
>> // Loop 2.
>> for (i = 0; i < 100; i++)
>> for (j = 0; j < 10000; j++)
>> bar();
>>
>> printf ("g = %d\n", g);
>> g = 0;
>>
>>
>> // Loop 3.
>> for (i = 0; i < 1000000; i++)
>> bar();
>>
>> printf ("g = %d\n", g);
>> g = 0;
>> }
>>
>> When compiled with profile instrumentation, frequency propagation is
>> distorting the real profile because it gives different frequency to the
>> calls to bar() in the 3 different loops. All 3 loops execute 1,000,000
>> times, but after frequency propagation, the first call to bar() gets a
>> weight of 520,202 in loop #1, 210,944 in loop #2 and 4,096 in loop #3. In
>> reality, every call to bar() should have a weight of 1,000,000.
>>
>> Duncan responded to this. My conclusion from his response: this is a bug,
>> not a fundamental issue. Remove the max scaling factor, switch the counts
>> to 64 bits and everything should be fine. If you disagree, let's discuss.
>>
>>
>>
>> Thanks. Diego.
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
More information about the llvm-dev
mailing list