[llvm-dev] distinct DISubprograms hindering sharing inlined subprogram descriptions

Thu Dec 15 16:20:07 PST 2016

On Thu, Dec 15, 2016 at 4:17 PM Teresa Johnson <tejohnson at google.com> wrote:

> On Thu, Dec 15, 2016 at 2:08 PM, David Blaikie <dblaikie at gmail.com> wrote:
>
> On Thu, Dec 15, 2016 at 1:30 PM Teresa Johnson <tejohnson at google.com>
> wrote:
>
> On Thu, Dec 15, 2016 at 11:38 AM, David Blaikie <dblaikie at gmail.com>
> wrote:
>
>
>
> On Thu, Dec 15, 2016 at 11:26 AM Teresa Johnson <tejohnson at google.com>
> wrote:
>
> Trying to wrap my brain around this, so a few questions below. =)
>
>
> Sure thing - sorry, did assume a bit too much arcane context here.
>
>
>
> On Thu, Dec 15, 2016 at 10:54 AM, David Blaikie <dblaikie at gmail.com>
> wrote:
>
> Branching off from a discussion of improvements to DIGlobalVariable
> representations that Adrian's working on - got me thinking about related
> changes that have already been made to DISubprogram.
>
> To reduce duplicate debug info when things like linkonce_odr functions
> were deduplicated in LTO linking, the relationship between a CU and
> DISubprogram was inverted (instead of a CU maintaining a list of
> subprograms, subprograms specify which CU they come from - and the
> llvm::Function references the DISubprogram, so if the llvm::Function goes
> away, so does the associated DISubprogram)
>
> I'm not sure if this caused a regression, but at least seems to miss a
> possible improvement:
>
> During IR linking (for LTO, ThinLTO, etc) these distinct DISubprogram
> definitions (& their CU link, even if they weren't marked 'distinct', the
> CU link would cause them to effectively be so) remain separate - this means
> that inlined versions in one CU don't refer to an existing subprogram
> definition in another CU.
>
> To demonstrate:
> inl.h:
> void f1();
> inline __attribute__((always_inline)) void f2() {
>   f1();
> }
> inl1.cpp:
> #include "inl.h"
> void c1() {
>   f2();
> }
> inl2.cpp:
> #include "inl.h"
> void c2() {
>   f2();
> }
>
> Compile to IR, llvm-link the result. The DWARF you get is basically the
> same as the DWARF you'd get without linking:
>
> DW_TAG_compile_unit
>   DW_AT_name "inl1.cpp"
>   DW_TAG_subprogram #0
>     DW_AT_name "f2"
>   DW_TAG_subprogram
>     DW_AT_name "c1"
>     DW_TAG_inlined_subroutine
>       DW_TAG_abstract_origin #0 "f2"
> DW_TAG_compile_unit
>   DW_AT_name "inl2.cpp"
>   DW_TAG_subprogram #1
>     DW_AT_name "f2"
>   DW_TAG_subprogram
>     DW_AT_name "c2"
>     DW_TAG_inlined_subroutine
>       DW_TAG_abstract_origin #1 "f2"
>
> Instead of something more like this:
>
> DW_TAG_compile_unit
>   DW_AT_name "inl1.cpp"
>   DW_TAG_subprogram #0
>     DW_AT_name "f2"
>   DW_TAG_subprogram
>     DW_AT_name "c1"
>     DW_TAG_inlined_subroutine
>       DW_TAG_abstract_origin #0 "f2"
> DW_TAG_compile_unit
>   DW_AT_name "inl2.cpp"
>   DW_TAG_subprogram
>     DW_AT_name "c2"
>     DW_TAG_inlined_subroutine
>       DW_TAG_abstract_origin #0 "f2"
>
> (note that only one abstract definition of f2 is produced here)
>
>
> I think I understand what you are saying. Essentially, having the SP->CU
> link allows the SP to be deduplicated when multiple *outline* copies of the
> corresponding function are deduplicated. But not when the multiple copies
> are inlined, as it looks like we need all the copies, right?
>
>
> Not quite - having the SP->CU link (well,h onestly, marking the SP as
> "distinct" does this, but even if we didn't do that, the SP->CU link would
> still do it) causes SPs /not/ to be deduplicated on IR linking.
>
> Each SP is distinct/not considered duplicate with any other. (if we didn't
> mark it 'distinct', the fact that each SP refers to its corresponding CU
> would produce the same effect - they wouldn't be deduplicated because they
> aren't identical - they refer to different CUs)
>
> For non-inlined cases, this is fine.
>
> Before we inverted the SP<>CU link, what would happen is that all copies
> of the llvm::Function would be dropped, but their SPs would be left around.
> So two CUs that both used the same linkonce_odr function (let's say no
> inlining actually occurred though) would both have a SP description in the
> DWARF - but one would actual have a proper definition (with a high/low PC,
> etc) the other would be missing those features, as though the function had
> been optimized away (which it sort of had)
>
> So by reversing the link, we got rid of those extra SP descriptions in the
> DWARF (and the extra SP descriptions in the metadata - I think they were
> duplicate back then because they still had a scope chain leading back to
> their CU (maybe we had gotten rid of that chain - if we had, then adding it
> back in may've actually caused more metadata, but less DWARF))
>
>
> I almost followed all of this, until I got to this last bit. I understood
> from above that with the SP->CU link (and distinct SPs), prevented
> deduplication. But this last bit sounds like we are in fact removing the
> duplicates in the DWARF and possibly also in the metadata.
>
>
> Ah, right - I can see how it reads that way, sorry.
>
> Old old way:
>
> first.ll:
>   CU1 -> {fn1_SP -> @fn1, inl_SP -> @inl, ... }
>   @fn1 ...
>   @inl ...
>   Resulting DWARF:
>     compile_unit CU1
>       subprogram fn1
>         high/low pc, etc
>       subprogram inl
>         high/low pc, etc
>
> second.ll:
>   CU2 -> {inl_SP2 -> @inl, SP2 -> @fn2, ... }
>   @inl ...
>   @fn2 ...
>   Resulting DWARF:
>     compile_unit CU2
>       subprogram inl
>         high/low pc, etc
>       subprogram fn2
>         high/low pc, etc
>
> link first.ll + second.ll:
>   CU1 -> {fn1_SP -> @fn1, inl_SP -> @inl, ... }
>   CU2 -> {inl_SP2 -> null, SP2 -> @fn2, ... }
>   @fn1 ...
>   @inl ...
>   @fn2 ...
>   Resulting DWARF:
>     compile_unit CU1
>       subprogram fn1
>         high/low pc, etc
>       subprogram inl
>         high/low pc, etc
>     compile_unit CU2
>       subprogram inl
>         name, but no high/low pc - this is unnecessary
>       subprogram fn2
>         high/low pc, etc
>
> New way:
>   CU1
>   @fn1 -> fn1_SP -> CU1
>   @inl -> inl_SP -> CU1
>   Resulting DWARF:
>     compile_unit CU1
>       subprogram fn1
>         high/low pc, etc
>       subprogram inl
>         high/low pc, etc
>
> second.ll:
>   CU2
>   @inl -> inl_SP -> CU2
>   @fn2 -> fn2_SP -> CU2
>   Resulting DWARF:
>     compile_unit CU2
>       subprogram inl
>         high/low pc, etc
>       subprogram fn2
>         high/low pc, etc
>
> link first.ll + second.ll (we pick @inl from first.ll in this example):
>   CU1
>   CU2
>   @fn1 -> fn1_SP -> CU1
>   @inl -> inl_SP -> CU1
>   @fn2 -> fn2_SP -> CU2
>   Resulting DWARF:
>     compile_unit CU1
>       subprogram fn1
>         high/low pc, etc
>       subprogram inl
>         high/low pc, etc
>     compile_unit CU2
>       subprogram fn2
>         high/low pc, etc
>
> So inverting the links causes us to completely drop the redundant
> description of 'inl' that appeared in C2 when the function was not inlined.
>
> But if the function /is/ inlined, then the inlined location descriptions
> that remain in @fn2 (assuming there was a call to @inl in @fn1 and @fn2)
> still point to that original (CU2) version of @inl - causing it to to be
> emitted into CU2.
>
> Whereas for type descriptions we don't do this - the type has no CU link,
> so they all get deduplicated and even if @fn2 has a parameter of the same
> type as @fn1 - we emit the type into CU1 when we first encounter it (when
> emitting @fn1) and then reference it whenever we need it, even when
> emitting @fn2 in the other CU.
>
>
> Ok, great, thanks for the detailed explanation! I was missing the
> distinction between the location descriptions and the type descriptions.
>
>
>
>
>
>
>
>
>
>
> Any thoughts? I imagine this is probably worth a reasonable amount of
> savings in an optimized build. Not huge, but not nothing. (probably not the
> top of anyone's list though, I realize)
>
> Should we remove the CU link from a non-internal linkage subprogram (&
> this may have an effect on the GV representation issue originally being
> discussed) and just emit it into whichever CU happens to need it first?
>
>
> I can see how this would be done in LTO where the compiler has full
> visibility. For ThinLTO presumably we would need to do some index-based
> marking? Can we at least do something when we import an inlined SP and drop
> it since we know it is defined elsewhere?
>
>
> Complete visibility isn't required to benefit here - and unfortunately
> there's nothing fancier (that I know of) that we can do to avoid emitting
> one definition of each used inline function in each thinlto object file we
> produce (we can't say "oh, the name of the function, its mangled name, the
> names and types of its parameters are over in that other object
> file/somewhere else" - but we can avoid emitting those descriptions in each
> /CU/ that uses the inlined function within a single ThinLTO object)
>
> I can provide some more thorough examples if that'd be helpful :)
>
>
> Ok, I think I understand. This is only emitting once per object file,
> which with ThinLTO can contain multiple CUs due to importing. But then with
> full LTO it sounds like we would be in even better shape, since it has a
> single module with all the CUs?
>
>
> Right - currently we emit once per CU, but with a change in format we
> could emit once per object file - which hurts ThinLTO over non-LTO (because
> ThinLTO produces more CUs (due to imports) per object file) and is neutral
> for full LTO (since it produces the same number of CUs, just in one object
> file).
>
> Improving this representation to produce once per object would help get
> ThinLTO back what it's currently paying - and improve full LTO further than
> its current position in this regard.
>
> But the gains might not be major/significant - I've done nothing to assess
> that, just observing that it is suboptimal.
>
> Thanks for asking/helping me explain it further, hopefully this is more
> descriptive.
>
>
> Great, thanks for pointing out this opportunity! When I get a chance
> probably tomorrow I will give your prototype patch a try and see how much
> difference it makes for Chromium.
>

Ah - the patch fixes the backend/LLVM to cope with this representational
change (at least for the one tiny example) but doesn't fix Clang to produce
the new representation (I just hand hacked the IR in a simple example to
see if it worked). I could work with you/work up a change that might do
that too, if that's handy/worth testing.

- Dave

>
> Teresa
>
>
>
> - Dave
>
>
>
> Teresa
>
>
>
>
> Thanks,
> Teresa
>
>
>
> This might be slightly sub-optimal, due to, say, the namespace being
> foreign to that CU. But it's how we do types currently, I think? So at
> least it'd be consistent and probably cheap enough/better.
>
>
>
>
> --
> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
> 408-460-2413 <(408)%20460-2413>
>
>
>
>
> --
> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
> 408-460-2413 <(408)%20460-2413>
>
>
>
>
> --
> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
> 408-460-2413 <(408)%20460-2413>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161216/7492f493/attachment.html>