[llvm-dev] distinct DISubprograms hindering sharing inlined subprogram descriptions

Thu Dec 15 16:26:00 PST 2016

On Thu, Dec 15, 2016 at 4:20 PM, David Blaikie <dblaikie at gmail.com> wrote:

>
>
> On Thu, Dec 15, 2016 at 4:17 PM Teresa Johnson <tejohnson at google.com>
> wrote:
>
>> On Thu, Dec 15, 2016 at 2:08 PM, David Blaikie <dblaikie at gmail.com>
>> wrote:
>>
>> On Thu, Dec 15, 2016 at 1:30 PM Teresa Johnson <tejohnson at google.com>
>> wrote:
>>
>> On Thu, Dec 15, 2016 at 11:38 AM, David Blaikie <dblaikie at gmail.com>
>> wrote:
>>
>>
>>
>> On Thu, Dec 15, 2016 at 11:26 AM Teresa Johnson <tejohnson at google.com>
>> wrote:
>>
>> Trying to wrap my brain around this, so a few questions below. =)
>>
>>
>> Sure thing - sorry, did assume a bit too much arcane context here.
>>
>>
>>
>> On Thu, Dec 15, 2016 at 10:54 AM, David Blaikie <dblaikie at gmail.com>
>> wrote:
>>
>> Branching off from a discussion of improvements to DIGlobalVariable
>> representations that Adrian's working on - got me thinking about related
>> changes that have already been made to DISubprogram.
>>
>> To reduce duplicate debug info when things like linkonce_odr functions
>> were deduplicated in LTO linking, the relationship between a CU and
>> DISubprogram was inverted (instead of a CU maintaining a list of
>> subprograms, subprograms specify which CU they come from - and the
>> llvm::Function references the DISubprogram, so if the llvm::Function goes
>> away, so does the associated DISubprogram)
>>
>> I'm not sure if this caused a regression, but at least seems to miss a
>> possible improvement:
>>
>> During IR linking (for LTO, ThinLTO, etc) these distinct DISubprogram
>> definitions (& their CU link, even if they weren't marked 'distinct', the
>> CU link would cause them to effectively be so) remain separate - this means
>> that inlined versions in one CU don't refer to an existing subprogram
>> definition in another CU.
>>
>> To demonstrate:
>> inl.h:
>> void f1();
>> inline __attribute__((always_inline)) void f2() {
>>   f1();
>> }
>> inl1.cpp:
>> #include "inl.h"
>> void c1() {
>>   f2();
>> }
>> inl2.cpp:
>> #include "inl.h"
>> void c2() {
>>   f2();
>> }
>>
>> Compile to IR, llvm-link the result. The DWARF you get is basically the
>> same as the DWARF you'd get without linking:
>>
>> DW_TAG_compile_unit
>>   DW_AT_name "inl1.cpp"
>>   DW_TAG_subprogram #0
>>     DW_AT_name "f2"
>>   DW_TAG_subprogram
>>     DW_AT_name "c1"
>>     DW_TAG_inlined_subroutine
>>       DW_TAG_abstract_origin #0 "f2"
>> DW_TAG_compile_unit
>>   DW_AT_name "inl2.cpp"
>>   DW_TAG_subprogram #1
>>     DW_AT_name "f2"
>>   DW_TAG_subprogram
>>     DW_AT_name "c2"
>>     DW_TAG_inlined_subroutine
>>       DW_TAG_abstract_origin #1 "f2"
>>
>> Instead of something more like this:
>>
>> DW_TAG_compile_unit
>>   DW_AT_name "inl1.cpp"
>>   DW_TAG_subprogram #0
>>     DW_AT_name "f2"
>>   DW_TAG_subprogram
>>     DW_AT_name "c1"
>>     DW_TAG_inlined_subroutine
>>       DW_TAG_abstract_origin #0 "f2"
>> DW_TAG_compile_unit
>>   DW_AT_name "inl2.cpp"
>>   DW_TAG_subprogram
>>     DW_AT_name "c2"
>>     DW_TAG_inlined_subroutine
>>       DW_TAG_abstract_origin #0 "f2"
>>
>> (note that only one abstract definition of f2 is produced here)
>>
>>
>> I think I understand what you are saying. Essentially, having the SP->CU
>> link allows the SP to be deduplicated when multiple *outline* copies of the
>> corresponding function are deduplicated. But not when the multiple copies
>> are inlined, as it looks like we need all the copies, right?
>>
>>
>> Not quite - having the SP->CU link (well,h onestly, marking the SP as
>> "distinct" does this, but even if we didn't do that, the SP->CU link would
>> still do it) causes SPs /not/ to be deduplicated on IR linking.
>>
>> Each SP is distinct/not considered duplicate with any other. (if we
>> didn't mark it 'distinct', the fact that each SP refers to its
>> corresponding CU would produce the same effect - they wouldn't be
>> deduplicated because they aren't identical - they refer to different CUs)
>>
>> For non-inlined cases, this is fine.
>>
>> Before we inverted the SP<>CU link, what would happen is that all copies
>> of the llvm::Function would be dropped, but their SPs would be left around.
>> So two CUs that both used the same linkonce_odr function (let's say no
>> inlining actually occurred though) would both have a SP description in the
>> DWARF - but one would actual have a proper definition (with a high/low PC,
>> etc) the other would be missing those features, as though the function had
>> been optimized away (which it sort of had)
>>
>> So by reversing the link, we got rid of those extra SP descriptions in
>> the DWARF (and the extra SP descriptions in the metadata - I think they
>> were duplicate back then because they still had a scope chain leading back
>> to their CU (maybe we had gotten rid of that chain - if we had, then adding
>> it back in may've actually caused more metadata, but less DWARF))
>>
>>
>> I almost followed all of this, until I got to this last bit. I understood
>> from above that with the SP->CU link (and distinct SPs), prevented
>> deduplication. But this last bit sounds like we are in fact removing the
>> duplicates in the DWARF and possibly also in the metadata.
>>
>>
>> Ah, right - I can see how it reads that way, sorry.
>>
>> Old old way:
>>
>> first.ll:
>>   CU1 -> {fn1_SP -> @fn1, inl_SP -> @inl, ... }
>>   @fn1 ...
>>   @inl ...
>>   Resulting DWARF:
>>     compile_unit CU1
>>       subprogram fn1
>>         high/low pc, etc
>>       subprogram inl
>>         high/low pc, etc
>>
>> second.ll:
>>   CU2 -> {inl_SP2 -> @inl, SP2 -> @fn2, ... }
>>   @inl ...
>>   @fn2 ...
>>   Resulting DWARF:
>>     compile_unit CU2
>>       subprogram inl
>>         high/low pc, etc
>>       subprogram fn2
>>         high/low pc, etc
>>
>> link first.ll + second.ll:
>>   CU1 -> {fn1_SP -> @fn1, inl_SP -> @inl, ... }
>>   CU2 -> {inl_SP2 -> null, SP2 -> @fn2, ... }
>>   @fn1 ...
>>   @inl ...
>>   @fn2 ...
>>   Resulting DWARF:
>>     compile_unit CU1
>>       subprogram fn1
>>         high/low pc, etc
>>       subprogram inl
>>         high/low pc, etc
>>     compile_unit CU2
>>       subprogram inl
>>         name, but no high/low pc - this is unnecessary
>>       subprogram fn2
>>         high/low pc, etc
>>
>> New way:
>>   CU1
>>   @fn1 -> fn1_SP -> CU1
>>   @inl -> inl_SP -> CU1
>>   Resulting DWARF:
>>     compile_unit CU1
>>       subprogram fn1
>>         high/low pc, etc
>>       subprogram inl
>>         high/low pc, etc
>>
>> second.ll:
>>   CU2
>>   @inl -> inl_SP -> CU2
>>   @fn2 -> fn2_SP -> CU2
>>   Resulting DWARF:
>>     compile_unit CU2
>>       subprogram inl
>>         high/low pc, etc
>>       subprogram fn2
>>         high/low pc, etc
>>
>> link first.ll + second.ll (we pick @inl from first.ll in this example):
>>   CU1
>>   CU2
>>   @fn1 -> fn1_SP -> CU1
>>   @inl -> inl_SP -> CU1
>>   @fn2 -> fn2_SP -> CU2
>>   Resulting DWARF:
>>     compile_unit CU1
>>       subprogram fn1
>>         high/low pc, etc
>>       subprogram inl
>>         high/low pc, etc
>>     compile_unit CU2
>>       subprogram fn2
>>         high/low pc, etc
>>
>> So inverting the links causes us to completely drop the redundant
>> description of 'inl' that appeared in C2 when the function was not inlined.
>>
>> But if the function /is/ inlined, then the inlined location descriptions
>> that remain in @fn2 (assuming there was a call to @inl in @fn1 and @fn2)
>> still point to that original (CU2) version of @inl - causing it to to be
>> emitted into CU2.
>>
>> Whereas for type descriptions we don't do this - the type has no CU link,
>> so they all get deduplicated and even if @fn2 has a parameter of the same
>> type as @fn1 - we emit the type into CU1 when we first encounter it (when
>> emitting @fn1) and then reference it whenever we need it, even when
>> emitting @fn2 in the other CU.
>>
>>
>> Ok, great, thanks for the detailed explanation! I was missing the
>> distinction between the location descriptions and the type descriptions.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Any thoughts? I imagine this is probably worth a reasonable amount of
>> savings in an optimized build. Not huge, but not nothing. (probably not the
>> top of anyone's list though, I realize)
>>
>> Should we remove the CU link from a non-internal linkage subprogram (&
>> this may have an effect on the GV representation issue originally being
>> discussed) and just emit it into whichever CU happens to need it first?
>>
>>
>> I can see how this would be done in LTO where the compiler has full
>> visibility. For ThinLTO presumably we would need to do some index-based
>> marking? Can we at least do something when we import an inlined SP and drop
>> it since we know it is defined elsewhere?
>>
>>
>> Complete visibility isn't required to benefit here - and unfortunately
>> there's nothing fancier (that I know of) that we can do to avoid emitting
>> one definition of each used inline function in each thinlto object file we
>> produce (we can't say "oh, the name of the function, its mangled name, the
>> names and types of its parameters are over in that other object
>> file/somewhere else" - but we can avoid emitting those descriptions in each
>> /CU/ that uses the inlined function within a single ThinLTO object)
>>
>> I can provide some more thorough examples if that'd be helpful :)
>>
>>
>> Ok, I think I understand. This is only emitting once per object file,
>> which with ThinLTO can contain multiple CUs due to importing. But then with
>> full LTO it sounds like we would be in even better shape, since it has a
>> single module with all the CUs?
>>
>>
>> Right - currently we emit once per CU, but with a change in format we
>> could emit once per object file - which hurts ThinLTO over non-LTO (because
>> ThinLTO produces more CUs (due to imports) per object file) and is neutral
>> for full LTO (since it produces the same number of CUs, just in one object
>> file).
>>
>> Improving this representation to produce once per object would help get
>> ThinLTO back what it's currently paying - and improve full LTO further than
>> its current position in this regard.
>>
>> But the gains might not be major/significant - I've done nothing to
>> assess that, just observing that it is suboptimal.
>>
>> Thanks for asking/helping me explain it further, hopefully this is more
>> descriptive.
>>
>>
>> Great, thanks for pointing out this opportunity! When I get a chance
>> probably tomorrow I will give your prototype patch a try and see how much
>> difference it makes for Chromium.
>>
>
> Ah - the patch fixes the backend/LLVM to cope with this representational
> change (at least for the one tiny example) but doesn't fix Clang to produce
> the new representation (I just hand hacked the IR in a simple example to
> see if it worked). I could work with you/work up a change that might do
> that too, if that's handy/worth testing.
>

Oh, right. Maybe I can get an idea of how much this would save. Is there a
decent heuristic to measure the redundancy?
Teresa

>
> - Dave
>
>
>>
>> Teresa
>>
>>
>>
>> - Dave
>>
>>
>>
>> Teresa
>>
>>
>>
>>
>> Thanks,
>> Teresa
>>
>>
>>
>> This might be slightly sub-optimal, due to, say, the namespace being
>> foreign to that CU. But it's how we do types currently, I think? So at
>> least it'd be consistent and probably cheap enough/better.
>>
>>
>>
>>
>> --
>> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
>> 408-460-2413 <(408)%20460-2413>
>>
>>
>>
>>
>> --
>> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
>> 408-460-2413 <(408)%20460-2413>
>>
>>
>>
>>
>> --
>> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
>> 408-460-2413 <(408)%20460-2413>
>>
>

-- 
Teresa Johnson |  Software Engineer |  tejohnson at google.com |  408-460-2413
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161215/bda6af40/attachment-0001.html>