[llvm-dev] distinct DISubprograms hindering sharing inlined subprogram descriptions

Sat Dec 24 10:18:09 PST 2016

On Fri, Dec 23, 2016 at 7:25 PM Duncan Exon Smith <dexonsmith at apple.com>
wrote:

>
>
> On Dec 23, 2016, at 18:36, David Blaikie <dblaikie at gmail.com> wrote:
>
>
>
> On Fri, Dec 23, 2016 at 11:47 AM Duncan P. N. Exon Smith <
> dexonsmith at apple.com> wrote:
>
> A few disjoint thoughts; sorry they're so delayed (I skimmed the responses
> below, and I think these are still relevant/not covered elsewhere).
>
> Firstly, why *should* DISubprogram definitions be distinct?  There were
> two reasons this was valuable (this was from before there was a cu: link).
>
> - It helped to fix long-standing bugs in the IRLinker, where
> uniqued-DISubprograms in different compile units would coalesce.
>
>
> Any idea why that ^ was a problem/bugs?
>
>
>  Now that the Function <-> DISubprogram connection has been reversed, I'm
> not entirely sure it's still a necessary precaution.
>
> - It broke a several uniquing cycles involving DISubprogram.  The
> existence of the uniquing cycles meant that affected DISubprograms weren't
> really uniqued, anyway (since we don't bother to solve graph isomorphism);
> and breaking the cycles sped up the IRLinker and reduced memory waste.
>
>
> Ah, fair point - my trivial example didn't have any local variables, so
> didn't hit this problem. They do indeed form cycles and I haven't tested,
> but seems totally reasonable/likely that that would break my simple example
> because cycles break uniquing and all that.
>
>
>  Making DISubprogram uniqued again would either have no real effect
> (because distinct nodes they reference aren't uniqued anymore either) or
> bring back uniquing cycles (which would have effects... but not the desired
> one).
>
> But there ought to be a way to get coalescing in the right places.  We
> just need to ensure that the uniqued/coalesced parts of the graph:
> - are acyclic, and
> - reference only coalesced nodes (note that leaves can happily include the
> strangely-coalesced distinct DICompositeType beast I created).
>
>
> Perhaps before I dive into your suggestion (which, it seems, is to
> separate out the definition but let the declaration/locals form cycles
>
>
> No (new) cycles; in fact this breaks some.
> - the Function itself references the defn and the locals;
> - the locals and the defn reference (transitively) the decl; and
> - the decl doesn't reference any of those.
>
> The only cycle is the decl <-> composite type one that already exists (and
> that odr magic (mostly) fixes).  Maybe that one could be removed somehow
> too, but it isn't directly relevant to the problem you're looking at.
>

Oh, yeah - I've an aside there too, maybe for another thread. (punchy
summary: anonymous enums in headers used for constants are technically ODR
violations (well, if they're ever used in an ODR entity - like an inline
function) - they're not public types, so we don't put the mangled name on
them & don't deduplicate them in DWARF type units - and probably similarly
don't deduplicate them in the IR... - not sure what the right answer is
there)

> (Note that this is somewhat inspired by Reid's point in his recent debug
> info talk, that CodeView avoids type cycles by breaking types into two
> records.)
>

Ah, thanks, that helps with me picture it!

>
> - that would at least be dropped because the definition would only be kept
> due to it being referenced from the llvm::Function), what about something
> simpler:
>
> What if scope chains didn't go all the way to the top (the subprogram) but
> stopped short of that?
>
> Now, granted - this was part of the great assertion I added several years
> back that caught all kinds of silliness - mostly around inlined call sites
> not having debugloc and leading to scope chains that lead to distinct
> roots. But the alternative to having and maintaining an invariant - is just
> to make things true by construction. All debug info scope chains within a
> function with an associated DISubprogram have an implicit last element of
> that DISubprogram?
>
>
> That seems reasonable, I think, now that the function points directly at
> its subprogram.
>
> I'm not sure it's simpler to get right though.  Dropping the scope runs
> the risk of coalescing things incorrectly.  I imagine the backend relies on
> pointer distinctness of two local variables called "a" that are in (or are
> parameters of) unrelated subprograms.  I think in some situations they
> could be coalesced incorrectly if they had no scope.
>

Just no top level scope - so it'd only happen if you had two variables with
the same name in the same scope. It might break in debug-having inline
functions inlined into non-debug-having functions (which then might in turn
be inlined into a debug-having function) - would have to check.

Also, not sure if inlined situations might introduce circularity. Yep. The
scope for inlinedAt DILocations point to the DISubprogram - so that'd
create a cycle. Ah well.

>
> That would mean it'd be harder to catch the bugs I found
>
>
> A shame if so, but if we can find another way to catch it or design it
> away, as you mention, it doesn't much matter.
>
> - but I think most of them came from that specific inlining situation (I
> can go back and do some archaeology and see if there's other exposure where
> we could sure up some defenses as well) which can be caught more directly
> (I forget if the inliner now catches it itself, or if we enhanced the debug
> info verifier to catch it after the inliner runs - if it's the verifier,
> then that wouldn't suffice if we made the change I'm suggesting - and we'd
> have to hook it up in the inliner itself)
>
> Thoughts?
>
>
> I've been thinking idly whether we can/should break DISubprogram apart
> into uniqued (coalesce-able) and distinct parts.  Still being someone
> ignorant of DWARF itself (and CodeView, for that matter), I'm not entirely
> sure.  But...
>
> Here's my understanding of the current uses of DISubprogram (where I'm
> using DISubprogramDecl for the usually uniqued declarations, and
> DISubprogramDefn for the always distinct definitions):
>
> // !dbg
> Function -> DISubprogram
>
> // Declaration (optional, only for member functions).
> DISubprogramDefn -> DISubprogramDecl
>
> // Variables:
> DISubprogramDefn -> MDNode -> DILocalVariable
>
> // Terminator for scope chains:
> DILocalVariable [-> DILocalScope [-> ...]] -> DISubprogramDefn
> DILocation [-> DILocalScope [-> ...]] -> DISubprogramDefn
> DICompositeType -> DISubprogramDefn // function-local type
>
> // Elements (member functions):
> DICompositeType -> MDNode -> DISubprogramDecl
> DISubprogramDecl -> DICompositeType
> DISubprogramDefn -> DICompositeType
>
>
> And here are the changes I would consider:
>
> 1. Split DISubprogramDecl and DISubprogramDefn into different types.
>
> 2. Remove definition-specific fields from DISubprogramDecl.
> - variables, a "dangerous" source of uniquing cycles.
> - declaration, which it will never have.
> - scopeLine, which it will never have.
> - unit, which the Defn will have when it's relevant.
>
> 3. Make a DISubprogramDecl mandatory for DISubprogramDefn.
>
> 4. Remove redundant fields from DISubprogramDefn.
> - name and linkageName are on the declaration.
> - type is redundant.
> - flags is likely redundant?
>
> Nothing so far has made any real changes, I think.  DISubprogramDecl
> should coalesce nicely between modules though.
>
> 5. Migrate links to point at DISubprogramDecl instead of DISubprogramDefn.
> - DILocalVariable scope chain.
> - DILocation scope chain.
> - DICompositeType scope.
>
> I don't think this will create any uniquing cycles.  DISubprogramDecl
> won't point at any of these.  Also, DILocalVariable should start coalescing
> with each other.
>
> Since DISubprogramDefn is only referenced from the Function !dbg
> attachment, only one should survive function-coalescing.
>
> One open question, since I don't know the backend code, is: can you emit
> the "correct" DWARF after step (5)?  I'm hoping you only need the Defn of
> the Function that has instructions referencing the
> DILocalVariable/DILocation.  In which case, we suddenly get a lot of
> coalescing.
>
> On 2016-Dec-15, at 10:54, David Blaikie <dblaikie at gmail.com> wrote:
>
> Branching off from a discussion of improvements to DIGlobalVariable
> representations that Adrian's working on - got me thinking about related
> changes that have already been made to DISubprogram.
>
> To reduce duplicate debug info when things like linkonce_odr functions
> were deduplicated in LTO linking, the relationship between a CU and
> DISubprogram was inverted (instead of a CU maintaining a list
> of subprograms, subprograms specify which CU they come from - and the
> llvm::Function references the DISubprogram, so if the llvm::Function goes
> away, so does the associated DISubprogram)
>
> I'm not sure if this caused a regression, but at least seems to miss a
> possible improvement:
>
> During IR linking (for LTO, ThinLTO, etc) these distinct DISubprogram
> definitions (& their CU link, even if they weren't marked 'distinct', the
> CU link would cause them to effectively be so) remain separate - this means
> that inlined versions in one CU don't refer to an existing subprogram
> definition in another CU.
>
> To demonstrate:
> inl.h:
> void f1();
> inline __attribute__((always_inline)) void f2() {
>   f1();
> }
> inl1.cpp:
> #include "inl.h"
> void c1() {
>   f2();
> }
> inl2.cpp:
> #include "inl.h"
> void c2() {
>   f2();
> }
>
> Compile to IR, llvm-link the result. The DWARF you get is basically the
> same as the DWARF you'd get without linking:
>
> DW_TAG_compile_unit
>   DW_AT_name "inl1.cpp"
>   DW_TAG_subprogram #0
>     DW_AT_name "f2"
>   DW_TAG_subprogram
>     DW_AT_name "c1"
>     DW_TAG_inlined_subroutine
>       DW_TAG_abstract_origin #0 "f2"
> DW_TAG_compile_unit
>   DW_AT_name "inl2.cpp"
>   DW_TAG_subprogram #1
>     DW_AT_name "f2"
>   DW_TAG_subprogram
>     DW_AT_name "c2"
>     DW_TAG_inlined_subroutine
>       DW_TAG_abstract_origin #1 "f2"
>
> Instead of something more like this:
>
> DW_TAG_compile_unit
>   DW_AT_name "inl1.cpp"
>   DW_TAG_subprogram #0
>     DW_AT_name "f2"
>   DW_TAG_subprogram
>     DW_AT_name "c1"
>     DW_TAG_inlined_subroutine
>       DW_TAG_abstract_origin #0 "f2"
> DW_TAG_compile_unit
>   DW_AT_name "inl2.cpp"
>   DW_TAG_subprogram
>     DW_AT_name "c2"
>     DW_TAG_inlined_subroutine
>       DW_TAG_abstract_origin #0 "f2"
>
> (note that only one abstract definition of f2 is produced here)
>
> Any thoughts? I imagine this is probably worth a reasonable amount of
> savings in an optimized build. Not huge, but not nothing. (probably not the
> top of anyone's list though, I realize)
>
> Should we remove the CU link from a non-internal linkage subprogram (&
> this may have an effect on the GV representation issue originally being
> discussed) and just emit it into whichever CU happens to need it first?
>
> This might be slightly sub-optimal, due to, say, the namespace being
> foreign to that CU. But it's how we do types currently, I think? So at
> least it'd be consistent and probably cheap enough/better.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161224/3bc74f1b/attachment.html>