[LLVMdev] RFC - Improvements to PGO profile support

Tue Mar 24 10:34:20 PDT 2015

> I follow what you're trying to say.  :)
>
> Also, trusting exact entry counts is going to be somewhat suspect.  These
> are *highly* susceptible to racy updates, overflow, etc...  Anything which
> puts too much implicit trust in these numbers is going to be problematic.

No, we won't be using profile counts in the absolute sense, a.k.a.  if
(count == 100000).  The counts is used as global hotness relative to
others. It is also used together with program summary data.

>

> I have no objection to adding a mechanism for expressing an entry count.  I
> am still very hesitant about the proposals with regards to redefining the
> current MD_prof.

For now, we won't touch MD_prof's definition.

>
> I'd encourage you to post a patch for the entry count mechanism, but not tie
> its semantics to exact execution count.  (Something like "the value provided
> must correctly describe the relative hotness of this routine against others
> in the program annoatated with the same metadata.  It is the relative
> scaling that is important, not the absolute value.  In particular, the value
> need not be an exact execution count.")
>

Define entry count in a more flexible way is fine -- as long as the
implementation can choose to use 'execution count' :)

thanks,

David

>
>>
>> Many of the other issues you raise seem like they could also be addressed
>> without major changes to the existing infrastructure. Let’s try to fix those
>> first.
>
>
> That's exactly the point of the proposal.  We definitely don't want to make
> major changes to the infrastructure at first. My thinking is to start
> working on making MD_prof a real count. One of the things that are happening
> is that the combination of real profile data plus the frequency propagation
> that we are currently doing is misleading the analysis.
>
> I consider this a major change.  You're trying to redefine a major part of
> the current system.
>
> Multiple people have spoken up and objected to this change (as currently
> described).  Please start somewhere else.
>
>
> For example (thanks David for the code and data). In the following code:
>
> int g;
> __attribute__((noinline)) void bar() {
>  g++;
> }
>
> extern int printf(const char*, ...);
>
> int main()
> {
>   int i, j, k;
>
>   g = 0;
>
>   // Loop 1.
>   for (i = 0; i < 100; i++)
>     for (j = 0; j < 100; j++)
>        for (k = 0; k < 100; k++)
>            bar();
>
>   printf ("g = %d\n", g);
>   g = 0;
>
>   // Loop 2.
>   for (i = 0; i < 100; i++)
>     for (j = 0; j < 10000; j++)
>         bar();
>
>   printf ("g = %d\n", g);
>   g = 0;
>
>
>   // Loop 3.
>   for (i = 0; i < 1000000; i++)
>     bar();
>
>   printf ("g = %d\n", g);
>   g = 0;
> }
>
> When compiled with profile instrumentation, frequency propagation is
> distorting the real profile because it gives different frequency to the
> calls to bar() in the 3 different loops. All 3 loops execute 1,000,000
> times, but after frequency propagation, the first call to bar() gets a
> weight of 520,202 in loop #1, 210,944 in  loop #2 and 4,096 in loop #3. In
> reality, every call to bar() should have a weight of 1,000,000.
>
> Duncan responded to this.  My conclusion from his response: this is a bug,
> not a fundamental issue.  Remove the max scaling factor, switch the counts
> to 64 bits and everything should be fine.  If you disagree, let's discuss.
>
>
>
> Thanks.  Diego.
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>