[llvm-dev] [RFC] Changing .llvm.call-graph-profile to use relocations

Fāng-ruì Sòng via llvm-dev llvm-dev at lists.llvm.org
Tue Jun 22 22:39:58 PDT 2021

On Thu, Jun 10, 2021 at 6:05 PM Alexander Yermolovich <ayermolo at fb.com> wrote:
> Created a new diff with the _NONE implementation:
> https://reviews.llvm.org/D104080
> ________________________________
> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Alexander Yermolovich via llvm-dev <llvm-dev at lists.llvm.org>
> Sent: Thursday, May 27, 2021 5:44 PM
> To: maskray at google.com <maskray at google.com>
> Cc: llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org>
> Subject: Re: [llvm-dev] [RFC] Changing .llvm.call-graph-profile to use relocations
> Replies inlined.
> ________________________________
> From: maskray at google.com <maskray at google.com>
> Sent: Thursday, May 27, 2021 5:23 PM
> To: Alexander Yermolovich <ayermolo at fb.com>
> Cc: llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org>; Wenlei He <wenlei at fb.com>; Modi Mo <modimo at fb.com>; dblaikie at gmail.com <dblaikie at gmail.com>
> Subject: Re: [RFC] Changing .llvm.call-graph-profile to use relocations
> On 2021-05-27, Alexander Yermolovich wrote:
> >I was thinking keeping both structures for backward compatibility in case object files with old representation are fed in to new llvm-objdump, and even lld. For example if someone will use older clang release with lld/llvm-objdump after this change.
> >For just changing the value and keeping the name. Looks like it will leave a gap in this sequential sequence. If a new flag is added later down the road and in this case latest clang used with older lld/llvm-objdump this will also present a problem as older tools will interpret new flag as SHT_LLVM_CALL_GRAPH_PROFILE.
> >Not sure if these are valid concerns, what do you think?
> >If we go for clean state, then maybe leave 0x6fff4c02 entry and change it to SHT_LLVM_CALL_GRAPH_PROFILE_DEPRICATED, and emit warning if objects with it are passed in.
> >  SHT_LLVM_ODRTAB = 0x6fff4c00,             // LLVM ODR table.
> >  SHT_LLVM_LINKER_OPTIONS = 0x6fff4c01,     // LLVM Linker Options.
> >  SHT_LLVM_CALL_GRAPH_PROFILE = 0x6fff4c02, // LLVM Call Graph Profile.
> >  SHT_LLVM_ADDRSIG = 0x6fff4c03, // List of address-significant symbols
> >                                 // for safe ICF.
> >      0x6fff4c04,                    // LLVM Dependent Library Specifiers.
> >  SHT_LLVM_SYMPART = 0x6fff4c05,     // Symbol partition specification.
> >  SHT_LLVM_PART_EHDR = 0x6fff4c06,   // ELF header for loadable partition.
> >  SHT_LLVM_PART_PHDR = 0x6fff4c07,   // Phdrs for loadable partition.
> >  SHT_LLVM_BB_ADDR_MAP = 0x6fff4c08, // LLVM Basic Block Address Map.
> Do you have measurement how well SHT_LLVM_CALL_GRAPH_PROFILE optimizes?
> My understanding is that with ThinLTO+PGO it is has very tiny benefit.
> [Alex] I do not. Wenlei mentioned he can dig up some numbers.
> The value reading old format is low. I don't expect users want to mix
> newer object files with old object files and want to have optimization
> from the old object files. SHT_LLVM_CALL_GRAPH_PROFILE_DEPRICATED is
> just adding maintenance cost.
> Changing the value but retaining the name is sufficient for the various
> compatibility issues.
> Reid's size concern is valid. Have you measured the size overhead?
> [Alex] Good point. I'll measure once I implement new approach with SHT_REL + R_X86_64_NONE.
> We can use the SHT_REL format for SHT_LLVM_CALL_GRAPH_PROFILE
> relocations, which brings down the per entry cost from 24 bytes to 16
> bytes for ELFCLASS64. (It is a bit unfortunate that the Linux kernel
> does not support ELFCLASS32 executables on a 64-bit architecture.
> For most usage we are using the small code model and don't benefit
> from 64-bit addresses/sizes.)
> >________________________________
> >From: maskray at google.com <maskray at google.com>
> >Sent: Thursday, May 27, 2021 11:27 AM
> >To: Alexander Yermolovich <ayermolo at fb.com>
> >Cc: llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org>; Wenlei He <wenlei at fb.com>; Modi Mo <modimo at fb.com>; dblaikie at gmail.com <dblaikie at gmail.com>
> >Subject: Re: [RFC] Changing .llvm.call-graph-profile to use relocations
> >
> >On 2021-05-27, Alexander Yermolovich wrote:
> >>Thank you for feedback, let me try to use R_X86_64_NONE, and introduce another type. . I thought R_X86_64_32 would be less impactful as structure of is preserved, but it is space wasteful.
> >>
> >>Alex
> >
> >Preversing the structure is not needed because the symbol representation
> >is changed anyway.
> >
> >You just need to change the value in
> >llvm/llvm/include/llvm/BinaryFormat/ELF.h
> >The name doesn't need to change.
> >
> >>From: maskray at google.com <maskray at google.com>
> >>Sent: Wednesday, May 26, 2021 5:03 PM
> >>To: Alexander Yermolovich <ayermolo at fb.com>
> >>Cc: llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org>; Wenlei He <wenlei at fb.com>; Modi Mo <modimo at fb.com>; dblaikie at gmail.com <dblaikie at gmail.com>
> >>Subject: Re: [RFC] Changing .llvm.call-graph-profile to use relocations
> >>
> >>On 2021-05-26, Alexander Yermolovich wrote:
> >>>Currently when .llvm.call-graph-profile is created by llvm it explicitly encodes the symbol indices. This section is basically a black box for post processing tools. For example, if we run strip -s on the object files the symbol table changes, but indices in that section do not.
> >>>
> >>>We propose to change this section to use relocations. The Frequency will still be in the .llvm.call-graph-profile, but symbol information will be in relocation section. In LLD information from both sections is used to reconstruct call graph profile. Relocations themselves will never be applied.
> >>
> >>Yes, a relocation based approach will be more robust and can fix the
> >>.llvm_addrsig issue https://sourceware.org/bugzilla/show_bug.cgi?id=23817
> >>
> >>The relocation type doesn't matter. Your implementation uses R_X86_64_32 and from/to/value/from/to/value/from/to/value
> >>An alternative design is R_X86_64_NONE + value/value/value, i.e. from/to do not occupy space in the content.
> >>We will get a 3x space saving.
> >>
> >>We need to change the section type since changing representation is incompatible
> >>and the sections from old object files should be ignored.
> >>The new section type will be ignored by old LLVM tools as well.
> >>
> >>>With this approach post processing tools that handle relocations correctly work for this section also.
> >>>
> >>>One thing is section is marked with SHF_EXCLUDE.
> >>>From spec
> >>>"This section is excluded from input to the link-edit of an executable or shared object. This flag is ignored if the SHF_ALLOC flag is also set, or if relocations exist against the section."
> >>>
> >>>So technically speaking it needs to be kept, and presumably relocations applied, but LLD follows gold and ld and discards sections marked as SHF_EXCLUDE even with relocations. So, I think this approach should be fine. https://reviews.llvm.org/D24966
> >>
> >>This is Solaris's spec (since 1996), not standard ELF's.  It can be advisory but
> >>our behavior (mostly GNU ABI+Linux ABI+LLVM extensions) does not necessarily
> >>follow it.  I cannot find a definition in the x86-64 psABI or a GNU ABI
> >>document.
> >>
> >>I think the behavior as implemented in gold and LLD is more useful.
> >>
> >>>Finally, this bug seems similar to https://sourceware.org/bugzilla/show_bug.cgi?id=23817   . Proposed solution for that was also to use relocations.
> >>>
> >>>Implementation: https://reviews.llvm.org/D103212

Circling back on this. https://reviews.llvm.org/D104080 mostly looks good.

.llvm.call-graph-profile mostly only affects -fprofile-use and
-fprofile-sample-use(-fauto-profile) produced output files.

>From the measurement for clang-13 the size looks really matter.
Also note that for for instrumentation PGO, -fprofile-generate= object
files are much larger than -fprofile-use=, so -fprofile-use= slightly
size increases don't matter.

> The size went up from 107KB to 322KB, aggregate of all the input sections.

More information about the llvm-dev mailing list