[LLVMdev] Recent changes in -gmlt break sample profiling
Eric Christopher
echristo at gmail.com
Sun Oct 26 16:59:45 PDT 2014
On Fri, Oct 24, 2014 at 3:27 PM, Diego Novillo <dnovillo at google.com> wrote:
>
>
> On Fri Oct 24 2014 at 6:21:14 PM David Blaikie <dblaikie at gmail.com> wrote:
>>
>> On Fri, Oct 24, 2014 at 3:16 PM, Diego Novillo <dnovillo at google.com>
>> wrote:
>>>
>>>
>>>
>>> On Fri Oct 24 2014 at 6:11:21 PM David Blaikie <dblaikie at gmail.com>
>>> wrote:
>>>>
>>>> On Fri, Oct 24, 2014 at 2:48 PM, Diego Novillo <dnovillo at google.com>
>>>> wrote:
>>>>>
>>>>> I'm not sure if this was intended, but it's going to be a problem for
>>>>> sample profiles.
>>>>>
>>>>> When we compile with -gmlt, the profiler expects to find the line
>>>>> number for all the function headers so that it can compute relative line
>>>>> locations for the profile.
>>>>>
>>>>> The tool that reads the ELF binary is not finding them, so it writes
>>>>> out absolute line numbers, which are impossible to match during the
>>>>> profile-use phase.
>>>>>
>>>>> The problem seems to be that we are missing DW_TAG_subprogram for all
>>>>> the functions in the file.
>>>>>
>>>>> Attached are the dwarf dumps of the same program. One compiled with my
>>>>> system's clang 3.4 and the other with today's trunk. In both compiles, I
>>>>> used -gline-tables-only.
>>>>>
>>>>> The trunk version is missing all the subprogram tags for the functions
>>>>> in the file. This breaks the sample profiler.
>>>>>
>>>>> Should I file a bug, or is -gmlt going to be like this from now on? The
>>>>> latter would be a problem for us.
>>>>
>>>>
>>>> Open to negotiation, but this change is intentional ( for details, see
>>>> the commit: http://llvm.org/viewvc/llvm-project?rev=218129&view=rev ).
>>>
>>>
>>> Well, this breaks us. Very hard. It absolutely blocks us from using
>>> SamplePGO with LLVM.
>>
>>
>> Sorry about that - I knew ASan needed gmlt data of a certain form and
>> worked carefully to ensure llvm-symbolizer could still symbolize with these
>> changes, but wasn't aware of this particular dependence from PGO (just
>> assumed it used the line table directly).
>
>
> It does, but it uses relative line numbers. That's why it needs the source
> locs for the function headers.
>
>>
>>
>>>
>>> The alternative would be to make the compiler use absolute line numbers,
>>> but in the experience we've collected with GCC, this makes the profiles very
>>> brittle wrt source changes.
>>
>>
>> It'd be interesting to see the data - I guess you throw out profiles that
>> don't match on a per-function basis? So adding a few lines to one function
>> will invalidate the profile for that function, but not all the subsequent
>> ones in the file?
>
>
> Right. Dehao started using absolute numbers, but then moved to relative
> numbers when he saw that the performance drops were fairly pronounced as the
> profile aged. I don't recall the exact drops he noticed.
>
>>
>>
>>>
>>> I don't have a better idea atm. Would there be any other way to generate
>>> relative line numbers?
>>
>>
>> Possibly.
>>
>>>
>>> Perhaps we could use the first line number we find with samples?
>>
>>
>> Probably - I guess you use the ELF symbol table to compute the address
>> range of each function? Given that, you could find the smallest line number
>> in that address range in the line table, and make everything else relative
>> to that.
>
>
> That could probably work, but I'm not sure how the ELF reading code in the
> conversion tool does this calculation. I'll check it out.
>
>>
>>
>>>
>>> The problem here is that this line will shift, depending on how the
>>> profile was obtained.
>>
>>
>> Not sure what you're referring to here ^
>
>
> Say the function starts at line 20. If in one profile run, we get samples at
> line 20 and line 23, then we'll compute the relative locations as 0 and 3.
> But if the first sample you get is at line 21, then you'll compute the
> relative locations as 0 and 2.
>
> Using the address ranges in the line table may be the way to go. I'll look
> at this next week.
>
I'm nearly certain this is the way to go here.
-eric
More information about the llvm-dev
mailing list