[cfe-dev] r222220 causes real debug-info bloat

Mon May 4 10:28:49 PDT 2015

On Mon, May 4, 2015 at 9:51 AM, Robinson, Paul <
Paul_Robinson at playstation.sony.com> wrote:

> > -----Original Message-----
> > From: Frédéric Riss [mailto:friss at apple.com]
> > Sent: Friday, May 01, 2015 6:34 PM
> > To: Robinson, Paul
> > Cc: cfe-dev at cs.uiuc.edu Developers (cfe-dev at cs.uiuc.edu)
> > Subject: Re: r222220 causes real debug-info bloat
> >
> > Hi!
> >
> > > On May 1, 2015, at 5:29 PM, Robinson, Paul
> > <Paul_Robinson at playstation.sony.com> wrote:
> > >
> > > We were doing some size analysis and noticed some ridiculous numbers
> > > related to debug-info size.  Investigation showed that essentially all
> > > of the bloat came from DW_TAG_imported_declaration pointing to
> > > DW_TAG_subprogram and the associated DW_TAG_formal_parameter DIEs.
> > > We tracked this to r222220, which basically caused every 'using' decl
> > > of a function or variable to have a forward declaration emitted to the
> > > DWARF, whether or not that 'using' decl itself was used in the CU.
> > >
> > > #include <stdlib.h>
> > > using ::abort
> > >
> > > In Clang 3.5, this produces a pretty minimal .debug_info section (just
> > > the DW_TAG_compile_unit).
> > > In Clang 3.6, we see an additional DW_TAG_subprogram for abort() and
> > then
> > > a DW_TAG_imported_declaration pointing to that declaration.
> > >
> > > #include <cstdlib>
> > >
> > > on Linux, Clang 3.5 wrote a .debug_info of 185 bytes, 3.6 was 1458.
> > >
> > > Multiply this by more headers and again by hundreds to thousands
> > > of modules and pretty soon you're talking multiple megabytes.
> > > Getting away from the benchmarks, a real game saw .debug_info increase
> > > by 13% (6 MB).
> > >
> > > r222220 basically causes a 'using' declaration of a function or global
> > > variable to conjure up a forward declaration, if we haven't already
> > > seen a declaration or definition.  The commentary talks about how this
> > > will be RAUW'd later on.  But I'm not sure what motivated this in the
> > > first place, and it clearly can have a huge adverse effect.
> >
> > The whole story is that I was working on getting debug info emitted
> > for function argument default values (which I haven’t gotten back to
> > yet BTW), and that my implementation didn’t work if the default value
> > was a call to a forward declared function. Our decl tracking didn’t
> > handle forward declarations at all, and David pointed out that this
> > was why we were also missing some DW_TAG_imported_declaration. I
> > then implemented support for forward declarations and tested it using
> > the the only current user that cared about forward decls, that is the
> > imported_declaration stuff.
> >
> > > I don't mind having a DW_TAG_imported_declaration for something that
> > > actually gets used in the CU, but a 'using' declaration all by itself
> > > should not count as "used" for purposes of emitting debug info.
> >
> > It’s not that the using clause counts as a ‘use’, it’s just a
> > question of source fidelity.
>
> Source fidelity is not about emitting every declaration you see.
> It's about, *if* you're going emit something, do it in a way that is
> faithful to the source-as-written.
>
> > Your above example isn’t really
> > compelling. By changing it a little bit to:
> >
> > #include <stdlib.h>
> >
> > namespace A {
> > using ::abort;
> > }
> >
> > The goal of the imported_declaration information is to inform
> > the debugger that in this CU, A::abort is the same thing as
> > ::abort. It’s just a matter of describing aliased name to
> > the debugger so that it can correctly evaluate source
> > expressions.
>
> Consider this:
>
> void abort();
> namespace A {
> #if USING
>   using ::abort();
> #else
>   void abort();
> #endif
> };
>
> In the not-USING case, Clang emits nothing but the CU DIE, because
> neither abort() declaration is used.
> In the USING case, we see the imported_declaration and the associated
> subprogram.  In both cases, the set of declared names is the same, and
> there are no *actual* uses of either name.
> Therefore, I argue, this is not about source fidelity but about
> declining to produce declarations not useful to the consumer.
>

How are they not useful to the consumer? If I want to be able to call
A::abort() in my debugger, it'll need to know about that using declaration.

>
> >
> > > Can somebody describe how these extra forward declarations fit into
> > > the Grand Scheme of Things in a beneficial way, and can we do something
> > > about unused 'using' declarations?
> >
> > I totally get your point about the size, and according to past
> > conversations, I gather that the use described above isn’t maybe
> > relevant to your debugger (which maybe points to something that
> > can be tuned depending on the target debugger? I’m sorry, but I
> > just came back from a long leave and I’m so much behind on list
> > reading that I have no idea of the status of that idea).
> >
> > IMO, it has nothing to do with the fact that the function/variable
> > is used or not. The using directives create new names and the only
> > way for the debugger(s) to understand these names is to have them
> > described in the debug info.
>
> By that argument you should emit every name you see in every header,
> whether it is used or not.

Not really - because a function needs a definition before you can call it
(yes, arguably a type can be used without any specific code being emitted),
so we can just rely on the debug info for the function being emitted where
the definition is emitted.

>   That's not what we do, because it's not
> useful to anyone

Well the types could be useful to people. Take the LLVM Operator hierarchy
- the vtable based type optimization causes these types to be emitted
nowhere at all (because these types are never instantiated, their vtable is
never emitted, etc - the code using these types is UB and casts concrete
objects of one hierarchy, into another hierarchy) but could be useful if we
did.

> and unnecessarily bloats the debug info.  The case of
> used-only-by-'using' is no different because there's no *actual* use.
>
> I found it instructive to add this to my not-USING example:
>
> void foo() { ::abort(); A::abort(); }
>
> which naively I would expect to induce subprogram DIEs for abort() and
> A::abort(), but in fact it doesn't, even with -fstandalone-debug.  That
> seems sub-optimal too.

Arguably useful in -fstandalone-debug, not really useful anywhere else
because we rely on the whole program being built with debug info enabled,
so the debug info for the function will go wherever its definition is
emitted (if no definition is provided, the function cannot be called and so
it's not really important to emit any debug info*)

* Well, maybe - if you wanted a really accurate expression parser, having
overload candidates is important even if those candidates are never called
(eg: if I have two inline functions, one is never called so no definition
is emitted - if I only emit debug info for the one that is called, then
when the user writes an expression that should resolve to the uncalled one,
it might not - it might be a valid (though unpreferred) call to the called
function). But DWARF doesn't have nearly enough info to describe all of
this to allow a debugger to get it right (you'd need actual templates, for
example - and deleted functions, etc, etc)

>   But, it just further illustrates the discrepancy
> between the 'using' declarations and non-'using' declarations.

> Also that there's a deeper problem here, which might or might not be
> what David Blaikie was getting at.
>
> The missing DIEs in the non-USING case, along with memories of trying
> to do something else with used/non-used declarations some while ago,
> make me think that even though abort() and A::abort() are (probably)
> being flagged, debug-info generation isn't going back through those
> non-defining declarations to see which ones ought to be emitted after
> all.
>

Yes, as you said, we don't, as a rule, emit declarations for any functions
(we do for types, so we can describe the types of variables/parameters/etc
that mention those types). The machinery Fred added enables the ability to
emit function declarations and used the using declaration code as a
proof-of-concept for it. It could be expanded for other things, though I
probably wouldn't bother.

> It looks like CGDebugInfo::finalize() does a post-pass for types, to
> some extent; maybe that needs to be done for other decls as well?
>

In generaly I agree with you that if the using declaration isn't used (ie:
no code refers to names via the alias it creates) we shouldn't emit it.
This isn't ideal (for the reasons mentioned above) but probably not much
less ideal than many of the inherent issues debug info has with omitting
unused things in general. The problem is that Clang currently doesn't keep
track of how a name is found and it's not a terribly simple task to do so
(last I checked with Richard Smith) due to SFINAE and other complex name
lookup - the definition of 'used' gets a bit fuzzy (if a name's absence
would result in a different program, is it 'used'? Even if it's never
actually called in the end?).

- David

> --paulr
>
> >
> > > Given how the patch works, it looks we can just short-circuit the
> > > creation of these forward declarations with no harm done, but I have to
> > > wonder whether we're shooting ourselves in the foot in some situation
> > > that isn't immediately obvious.
> >
> > If the git commit message is still accurate regarding the use of that
> > function, then you’ll just go back to the previous state which you
> > liked better. If the function grows new callers, you might lose
> > more stuff, but IIUC it should mostly be stuff that you don’t care
> > about anyway.
> >
> > Fred
> >
> >
> > > Thanks,
> > > --paulr
> > >
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20150504/6670575c/attachment.html>