[cfe-dev] r222220 causes real debug-info bloat

Mon May 4 10:56:12 PDT 2015

On Mon, May 4, 2015 at 10:39 AM, Frédéric Riss <friss at apple.com> wrote:

>
> On May 4, 2015, at 9:51 AM, Robinson, Paul <
> Paul_Robinson at playstation.sony.com> wrote:
>
> -----Original Message-----
> From: Frédéric Riss [mailto:friss at apple.com <friss at apple.com>]
> Sent: Friday, May 01, 2015 6:34 PM
> To: Robinson, Paul
> Cc: cfe-dev at cs.uiuc.edu Developers (cfe-dev at cs.uiuc.edu)
> Subject: Re: r222220 causes real debug-info bloat
>
> Hi!
>
> On May 1, 2015, at 5:29 PM, Robinson, Paul
>
> <Paul_Robinson at playstation.sony.com> wrote:
>
>
> We were doing some size analysis and noticed some ridiculous numbers
> related to debug-info size.  Investigation showed that essentially all
> of the bloat came from DW_TAG_imported_declaration pointing to
> DW_TAG_subprogram and the associated DW_TAG_formal_parameter DIEs.
> We tracked this to r222220, which basically caused every 'using' decl
> of a function or variable to have a forward declaration emitted to the
> DWARF, whether or not that 'using' decl itself was used in the CU.
>
> #include <stdlib.h>
> using ::abort
>
> In Clang 3.5, this produces a pretty minimal .debug_info section (just
> the DW_TAG_compile_unit).
> In Clang 3.6, we see an additional DW_TAG_subprogram for abort() and
>
> then
>
> a DW_TAG_imported_declaration pointing to that declaration.
>
> #include <cstdlib>
>
> on Linux, Clang 3.5 wrote a .debug_info of 185 bytes, 3.6 was 1458.
>
> Multiply this by more headers and again by hundreds to thousands
> of modules and pretty soon you're talking multiple megabytes.
> Getting away from the benchmarks, a real game saw .debug_info increase
> by 13% (6 MB).
>
> r222220 basically causes a 'using' declaration of a function or global
> variable to conjure up a forward declaration, if we haven't already
> seen a declaration or definition.  The commentary talks about how this
> will be RAUW'd later on.  But I'm not sure what motivated this in the
> first place, and it clearly can have a huge adverse effect.
>
>
> The whole story is that I was working on getting debug info emitted
> for function argument default values (which I haven’t gotten back to
> yet BTW), and that my implementation didn’t work if the default value
> was a call to a forward declared function. Our decl tracking didn’t
> handle forward declarations at all, and David pointed out that this
> was why we were also missing some DW_TAG_imported_declaration. I
> then implemented support for forward declarations and tested it using
> the the only current user that cared about forward decls, that is the
> imported_declaration stuff.
>
> I don't mind having a DW_TAG_imported_declaration for something that
> actually gets used in the CU, but a 'using' declaration all by itself
> should not count as "used" for purposes of emitting debug info.
>
>
> It’s not that the using clause counts as a ‘use’, it’s just a
> question of source fidelity.
>
>
> Source fidelity is not about emitting every declaration you see.
> It's about, *if* you're going emit something, do it in a way that is
> faithful to the source-as-written.
>
>
> and I’d add “gives means to the debugger to evaluate every source
> expression
> as it is written in the source.”
>
> Your above example isn’t really
> compelling. By changing it a little bit to:
>
> #include <stdlib.h>
>
> namespace A {
> using ::abort;
> }
>
> The goal of the imported_declaration information is to inform
> the debugger that in this CU, A::abort is the same thing as
> ::abort. It’s just a matter of describing aliased name to
> the debugger so that it can correctly evaluate source
> expressions.
>
>
> Consider this:
>
> void abort();
> namespace A {
> #if USING
>  using ::abort();
> #else
>  void abort();
> #endif
> };
>
> In the not-USING case, Clang emits nothing but the CU DIE, because
> neither abort() declaration is used.
> In the USING case, we see the imported_declaration and the associated
> subprogram.  In both cases, the set of declared names is the same, and
> there are no *actual* uses of either name.
>
>
> I’m repeating myself, but this is not about uses, just about describing
> names.
>
> Then, as a compiler policy, you might want to limit the names you describe
> to the ones that are actually used in the program (we have no code to track
> the uses and modify the debug info accordingly). You might also want to
> emit all the names so that the debugger can evaluate accurately every
> expression that could happen in the source code.
>
> I’m not arguing that one of the above is better than the other (the answer
> can
> certainly be different depending on the environment), I mostly want to
> point
> out that this information isn’t as useless as you seem to think.
>
> Therefore, I argue, this is not about source fidelity but about
> declining to produce declarations not useful to the consumer.
>
>
> David would need to confirm, but I think that if we revert the change,
> there
> are tests in the GDB test suite that will fail.
>

I don't recall precisely - but yes, I think we un-XFAIL'd some tests after
you made this change. Could check.

> ‘Not useful’ information should not
> be able to break user level tests, should it?
>
>
> Can somebody describe how these extra forward declarations fit into
> the Grand Scheme of Things in a beneficial way, and can we do something
> about unused 'using' declarations?
>
>
> I totally get your point about the size, and according to past
> conversations, I gather that the use described above isn’t maybe
> relevant to your debugger (which maybe points to something that
> can be tuned depending on the target debugger? I’m sorry, but I
> just came back from a long leave and I’m so much behind on list
> reading that I have no idea of the status of that idea).
>
> IMO, it has nothing to do with the fact that the function/variable
> is used or not. The using directives create new names and the only
> way for the debugger(s) to understand these names is to have them
> described in the debug info.
>
>
> By that argument you should emit every name you see in every header,
> whether it is used or not.  That's not what we do, because it's not
> useful to anyone and unnecessarily bloats the debug info.  The case of
> used-only-by-'using' is no different because there's no *actual* use.
>
>
> I’m sorry, but the fact that all declarations aren’t emitted happen to
> bother
> me from time to time. I’m a heavy user of debugger conditional breakpoints,
> and the conditions often involve calling to functions that aren’t defined
> in my
> program, but which were described in the headers (for example libc).
>
> Not having the prototype for these functions available to the debugger
> requires
> me to play casting games so that it gets the calling convention correctly.
> If all
> the declarations were to be emitted I wouldn’t have that issue and my debug
> experience would be better.
>
> Do not get me wrong. I’m not arguing for including all the declarations.
> I’m just
> trying to point out the the information isn’t useless as you describe it
> and that
> there is a balance to find. Including only the names that have been really
> used
> in the program would be a perfectly sensible one, but we do not have the
> code
> that does that tracking!
>
> I found it instructive to add this to my not-USING example:
>
> void foo() { ::abort(); A::abort(); }
>
> which naively I would expect to induce subprogram DIEs for abort() and
> A::abort(), but in fact it doesn't, even with -fstandalone-debug.  That
> seems sub-optimal too.  But, it just further illustrates the discrepancy
> between the 'using' declarations and non-'using' declarations.
>
> Also that there's a deeper problem here, which might or might not be
> what David Blaikie was getting at.
>
> The missing DIEs in the non-USING case, along with memories of trying
> to do something else with used/non-used declarations some while ago,
> make me think that even though abort() and A::abort() are (probably)
> being flagged, debug-info generation isn't going back through those
> non-defining declarations to see which ones ought to be emitted after
> all.
>
>
> As pointed above, such code that goes from the uses to the debug info just
> doesn’t exist. The debug info for types and declarations is generated
> during
> AST construction (IIRC) and not touched afterwards.
>
> Fred
>
> It looks like CGDebugInfo::finalize() does a post-pass for types, to
> some extent; maybe that needs to be done for other decls as well?
> --paulr
>
>
> Given how the patch works, it looks we can just short-circuit the
> creation of these forward declarations with no harm done, but I have to
> wonder whether we're shooting ourselves in the foot in some situation
> that isn't immediately obvious.
>
>
> If the git commit message is still accurate regarding the use of that
> function, then you’ll just go back to the previous state which you
> liked better. If the function grows new callers, you might lose
> more stuff, but IIUC it should mostly be stuff that you don’t care
> about anyway.
>
> Fred
>
>
> Thanks,
> --paulr
>
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20150504/97218217/attachment.html>