[llvm-dev] [RFC] Changing .llvm.call-graph-profile to use	relocations
    Wenlei He via llvm-dev 
    llvm-dev at lists.llvm.org
       
    Thu May 27 17:56:01 PDT 2021
    
    
  
> Do you have measurement how well SHT_LLVM_CALL_GRAPH_PROFILE optimizes?
> My understanding is that with ThinLTO+PGO it is has very tiny benefit.
With PLO(BOLT) the benefit is zero, but with PGO and without PLO, it was still visible. We’ve seen ~1% RPS improvement in the past when measuring mysql.
PGO does not cover profile guided function layout, either the linker or PLO need to take care of that part. We’ve been using standalone HFSort<https://research.fb.com/publications/optimizing-function-placement-for-large-scale-data-center-applications-2/> on top of PGO, and we wanted to fold that into either PLO or linker – the profile guided function layout exists in lld, but we ran into bugs hence the RFC and Alexander’s fix.
Thanks,
Wenlei
From: Alexander Yermolovich <ayermolo at fb.com>
Date: Thursday, May 27, 2021 at 5:44 PM
To: maskray at google.com <maskray at google.com>
Cc: llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org>, Wenlei He <wenlei at fb.com>, Modi Mo <modimo at fb.com>, dblaikie at gmail.com <dblaikie at gmail.com>
Subject: Re: [RFC] Changing .llvm.call-graph-profile to use relocations
Replies inlined.
________________________________
From: maskray at google.com <maskray at google.com>
Sent: Thursday, May 27, 2021 5:23 PM
To: Alexander Yermolovich <ayermolo at fb.com>
Cc: llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org>; Wenlei He <wenlei at fb.com>; Modi Mo <modimo at fb.com>; dblaikie at gmail.com <dblaikie at gmail.com>
Subject: Re: [RFC] Changing .llvm.call-graph-profile to use relocations
On 2021-05-27, Alexander Yermolovich wrote:
>I was thinking keeping both structures for backward compatibility in case object files with old representation are fed in to new llvm-objdump, and even lld. For example if someone will use older clang release with lld/llvm-objdump after this change.
>For just changing the value and keeping the name. Looks like it will leave a gap in this sequential sequence. If a new flag is added later down the road and in this case latest clang used with older lld/llvm-objdump this will also present a problem as older tools will interpret new flag as SHT_LLVM_CALL_GRAPH_PROFILE.
>Not sure if these are valid concerns, what do you think?
>If we go for clean state, then maybe leave 0x6fff4c02 entry and change it to SHT_LLVM_CALL_GRAPH_PROFILE_DEPRICATED, and emit warning if objects with it are passed in.
>  SHT_LLVM_ODRTAB = 0x6fff4c00,             // LLVM ODR table.
>  SHT_LLVM_LINKER_OPTIONS = 0x6fff4c01,     // LLVM Linker Options.
>  SHT_LLVM_CALL_GRAPH_PROFILE = 0x6fff4c02, // LLVM Call Graph Profile.
>  SHT_LLVM_ADDRSIG = 0x6fff4c03, // List of address-significant symbols
>                                 // for safe ICF.
>  SHT_LLVM_DEPENDENT_LIBRARIES =
>      0x6fff4c04,                    // LLVM Dependent Library Specifiers.
>  SHT_LLVM_SYMPART = 0x6fff4c05,     // Symbol partition specification.
>  SHT_LLVM_PART_EHDR = 0x6fff4c06,   // ELF header for loadable partition.
>  SHT_LLVM_PART_PHDR = 0x6fff4c07,   // Phdrs for loadable partition.
>  SHT_LLVM_BB_ADDR_MAP = 0x6fff4c08, // LLVM Basic Block Address Map.
Do you have measurement how well SHT_LLVM_CALL_GRAPH_PROFILE optimizes?
My understanding is that with ThinLTO+PGO it is has very tiny benefit.
[Alex] I do not. Wenlei mentioned he can dig up some numbers.
The value reading old format is low. I don't expect users want to mix
newer object files with old object files and want to have optimization
from the old object files. SHT_LLVM_CALL_GRAPH_PROFILE_DEPRICATED is
just adding maintenance cost.
Changing the value but retaining the name is sufficient for the various
compatibility issues.
Reid's size concern is valid. Have you measured the size overhead?
[Alex] Good point. I'll measure once I implement new approach with SHT_REL + R_X86_64_NONE.
We can use the SHT_REL format for SHT_LLVM_CALL_GRAPH_PROFILE
relocations, which brings down the per entry cost from 24 bytes to 16
bytes for ELFCLASS64. (It is a bit unfortunate that the Linux kernel
does not support ELFCLASS32 executables on a 64-bit architecture.
For most usage we are using the small code model and don't benefit
from 64-bit addresses/sizes.)
>________________________________
>From: maskray at google.com <maskray at google.com>
>Sent: Thursday, May 27, 2021 11:27 AM
>To: Alexander Yermolovich <ayermolo at fb.com>
>Cc: llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org>; Wenlei He <wenlei at fb.com>; Modi Mo <modimo at fb.com>; dblaikie at gmail.com <dblaikie at gmail.com>
>Subject: Re: [RFC] Changing .llvm.call-graph-profile to use relocations
>
>On 2021-05-27, Alexander Yermolovich wrote:
>>Thank you for feedback, let me try to use R_X86_64_NONE, and introduce another type. . I thought R_X86_64_32 would be less impactful as structure of is preserved, but it is space wasteful.
>>For type name: ELF::SHT_LLVM_CALL_GRAPH_PROFILE_RELA ?
>>
>>Alex
>
>Preversing the structure is not needed because the symbol representation
>is changed anyway.
>
>You just need to change the value in
>llvm/llvm/include/llvm/BinaryFormat/ELF.h
>The name doesn't need to change.
>
>>From: maskray at google.com <maskray at google.com>
>>Sent: Wednesday, May 26, 2021 5:03 PM
>>To: Alexander Yermolovich <ayermolo at fb.com>
>>Cc: llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org>; Wenlei He <wenlei at fb.com>; Modi Mo <modimo at fb.com>; dblaikie at gmail.com <dblaikie at gmail.com>
>>Subject: Re: [RFC] Changing .llvm.call-graph-profile to use relocations
>>
>>On 2021-05-26, Alexander Yermolovich wrote:
>>>Currently when .llvm.call-graph-profile is created by llvm it explicitly encodes the symbol indices. This section is basically a black box for post processing tools. For example, if we run strip -s on the object files the symbol table changes, but indices in that section do not.
>>>
>>>We propose to change this section to use relocations. The Frequency will still be in the .llvm.call-graph-profile, but symbol information will be in relocation section. In LLD information from both sections is used to reconstruct call graph profile. Relocations themselves will never be applied.
>>
>>Yes, a relocation based approach will be more robust and can fix the
>>.llvm_addrsig issue https://sourceware.org/bugzilla/show_bug.cgi?id=23817<https://sourceware.org/bugzilla/show_bug.cgi?id=23817>
>>
>>The relocation type doesn't matter. Your implementation uses R_X86_64_32 and from/to/value/from/to/value/from/to/value
>>An alternative design is R_X86_64_NONE + value/value/value, i.e. from/to do not occupy space in the content.
>>We will get a 3x space saving.
>>
>>We need to change the section type since changing representation is incompatible
>>and the sections from old object files should be ignored.
>>The new section type will be ignored by old LLVM tools as well.
>>
>>>With this approach post processing tools that handle relocations correctly work for this section also.
>>>
>>>One thing is section is marked with SHF_EXCLUDE.
>>>From spec
>>>"This section is excluded from input to the link-edit of an executable or shared object. This flag is ignored if the SHF_ALLOC flag is also set, or if relocations exist against the section."
>>>
>>>So technically speaking it needs to be kept, and presumably relocations applied, but LLD follows gold and ld and discards sections marked as SHF_EXCLUDE even with relocations. So, I think this approach should be fine. https://reviews.llvm.org/D24966<https://reviews.llvm.org/D24966>
>>
>>This is Solaris's spec (since 1996), not standard ELF's.  It can be advisory but
>>our behavior (mostly GNU ABI+Linux ABI+LLVM extensions) does not necessarily
>>follow it.  I cannot find a definition in the x86-64 psABI or a GNU ABI
>>document.
>>
>>I think the behavior as implemented in gold and LLD is more useful.
>>
>>>Finally, this bug seems similar to https://sourceware.org/bugzilla/show_bug.cgi?id=23817<https://sourceware.org/bugzilla/show_bug.cgi?id=23817>   . Proposed solution for that was also to use relocations.
>>>
>>>Implementation: https://reviews.llvm.org/D103212<https://reviews.llvm.org/D103212>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210528/4d75fccc/attachment-0001.html>
    
    
More information about the llvm-dev
mailing list