[PATCH] D80765: [ELF] Handle bitcode comdat groups separately to deduplicate thinlto comdat sections

Mon Jul 27 15:42:25 PDT 2020

hoyFB added a comment.

In D80765#2177014 <https://reviews.llvm.org/D80765#2177014>, @dblaikie wrote:

> In D80765#2174269 <https://reviews.llvm.org/D80765#2174269>, @MaskRay wrote:
>
> > In D80765#2174250 <https://reviews.llvm.org/D80765#2174250>, @dblaikie wrote:
> >
> > > In D80765#2174141 <https://reviews.llvm.org/D80765#2174141>, @hoyFB wrote:
> > >
> > > > In D80765#2170820 <https://reviews.llvm.org/D80765#2170820>, @dblaikie wrote:
> > > >
> > > > > Yeah - one alternative here would be for ThinLTO To "home" debug type information, the same way I believe it "homes" inline functions. (original/incoming IR to the thinlink may have multiple definitions of inline functions - one in every module that has a call to the inline function - but something in the think link summary basically says "drop this definition" in all but one of those modules and in that one blessed module it says "always produce a weak definition of this function, even if you don't need it" - something like that should be done with type information, which would remove the redundancy in the object files)
> > > > >
> > > > > In the case of a full executable link (with the homing above implemented), I'd then suggest not enabling type units, since they just add overhead and wouldn't reduce any duplication - in the case of a static library built with ThinLTO (if that's even a thing), then you'd still potentially want to use type units so they could be deduplicated against other libraries.
> > > >
> > > >
> > > > Enabling compile-time type deduplication is indeed an ideal solution, as you pointed out finding a prevailing symbol definition and dropping others for duplicate functions from different modules. This would require quite some work in parsing type metadata on the IR and might slow down thinLink.
> > >
> > >
> > > Fair point - though presumably the first stage compile would do the metadata walk through the IR, provide a list of types in the thin summary file and the thin link would then communicate to the backend compiles specifying which types each backend compile should emit into its respective object file.
> > >
> > > >   Also this may not completely address deduplication of new data introduced during LTO postLink code generation time. The type unit level deduplication could be the last resort to handle the duplicates, like what is done in non-LTO build.
> > >
> > > Not sure I follow - what duplication would remain if the thin link homed type descriptions? When linking to non-LTO (normal machine code/object files) libraries? Yeah, fair point (tangential warning: LLVM's type unit hashes are non-DWARF-conforming and not compatible with GCC's type unit hashes... :/)
> > >
> > > It /sounds/ like, maybe the original fix in D62884 <https://reviews.llvm.org/D62884> maybe was going in the wrong direction, and instead ThinLTO should be removing the comdat group from "homed" inline functions? Then there wouldn't be a need for a special case in how comdats are handled.
> >
> >
> > I am with @dblaikie here. A bit of archaeology finds that rG4de44b7ef87bcef83798eee69fdcbfab9866d52e <https://reviews.llvm.org/rG4de44b7ef87bcef83798eee69fdcbfab9866d52e> was probably going in the wrong direction but it was indeed the simplest/least intrusive approach.
> >  D62884 <https://reviews.llvm.org/D62884> just made some refactorings. Just a thought: for ThinLTO backend, dropping comdat in processGlobalsForThinLTO may achieve some results.
>
>
> Yeah, maybe something near/similar to https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/Utils/FunctionImportUtils.cpp#L275 - where the comdat is imported as available externally, and its comdat group is stripped. Theory is, maybe we should be stripping the comdat on the other side too.
>
> Hmm, maybe not, though? What about non-whole-program-ThinLTO? There could be a comdat copy of this function in a static library that it should be deduplicated against?

This is probably fine if the static library is a native input to lld , since lld itself can deduplicate comdat groups of its native input.

> I take it the underlying assumption lld is making is that a comdat group is only kept alive by references from within the object file that defines it? Is that formally specified/required? If not, then maybe we could/should remoev that assumption, so that ThinLTO can work naturally with non-(Thin)LTO archives and still do the suitable comdat deduplication or dropping?
> 
>> However, one issue is that on the MC layer the section group signature is part of ELFSectionKey.
>> 
>>   .section foo,"aG", at progbits,aaa,comdat
>>   .section foo,"aG", at progbits,bbb,comdat
>> 
>> 
>> are two sections. Dropping the comdat key will make them one combined section.
> 
> Is that a problem? When linked they end up in the same section, right?
> 
>> Unique linkage may work around it but there may be a fair amount of complexity in TargetLoweringObjectFileImpl/MC.
>>  I agree with pcc that LTO should not generate additional arbitrary comdat objects.
> 
> What additional arbitrary comdat objects are being generated or proposed to be generated?

So far the Dwarf types are the only new artifacts generated by the LTO backend. What I was wondering was new LTO-exclusive codegen passes that may introduce new comdat groups, but sound like it is a design principle that new symbols/data should be only generated in preLink LTO phase.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80765/new/

https://reviews.llvm.org/D80765