[cfe-dev] r222220 causes real debug-info bloat

David Blaikie dblaikie at gmail.com
Mon May 4 13:54:57 PDT 2015


Rather than replying point by point to various subjective descriptions of
the state of things, perhaps I can sum up:

The fix is to teach Clang to track the use of using declarations (as I
mentioned, there are complications with that) and then only emit those
using declarations that are used. It's not perfect, but it's a reasonable
thing to do.

If you'd like this fixed, I'm sure Fred & I can help point you in the right
direction for where to get started on this work - but as I said, it's not
simple/easy/trivial due to the complexity of what it means to "use" a using
declaration. (see my previous email for some examples of wrinkles)

- David

On Mon, May 4, 2015 at 12:46 PM, Robinson, Paul <
Paul_Robinson at playstation.sony.com> wrote:

>  From Fred:
>
> Source fidelity is not about emitting every declaration you see.
> It's about, *if* you're going emit something, do it in a way that is
> faithful to the source-as-written.
>
>
>
> and I’d add “gives means to the debugger to evaluate every source
> expression
>
> as it is written in the source.”
>
>
>
> Agreed.  But a 'using' declaration is not an expression.  If the declared
> name is not used in any source expression, it can hardly be needed to
> evaluate a source expression as written in the source.
>
>
>
> I admit calling this stuff "useless" is a bit of hyperbole.  Clang is
> being inconsistent.  If a normal unqualified function declaration isn't
> emitted, and a normal function declaration inside a namespace isn't
> emitted, I see no argument justifying emitting a function that happens to
> be declared with a 'using' declaration. A 'using' versus a normal
> declaration is typically an implementation detail, not something inherently
> significant that can change the meaning of a program.
>
>
>
> The whole story is that I was working on getting debug info emitted
> for function argument default values (which I haven’t gotten back to
> yet BTW), and that my implementation didn’t work if the default value
> was a call to a forward declared function. Our decl tracking didn’t
> handle forward declarations at all, and David pointed out that this
> was why we were also missing some DW_TAG_imported_declaration. I
> then implemented support for forward declarations and tested it using
> the the only current user that cared about forward decls, that is the
> imported_declaration stuff.
>
>  But it looks to me like these "missing" imported_declarations aren't any
> more missing than the equivalent non-'using' declarations.  Is there a
> principled justification for the inconsistency?  I'm not hearing one.  What
> I'm hearing is the Clang can't keep track of what's actually needed in the
> source, so we arbitrarily emit some things and not others when we're not
> sure.  That's really unsatisfactory.
>
> Now, if the forward declaration bit is something that was part of a larger
> project that is half-implemented, and the imported_declaration bit was a
> momentary convenience that you intended to take out later, well that's fine
> so long as you now intend to finish the job.
>
>
>
> I am sympathetic to the desire to have the full power of libc available in
> conditional breakpoints, and so forth.  If that's your preferred mode then
> perhaps a more –gfull-to-bursting mode might be the ticket.  If it didn't
> cost multi-megabytes to get there, and the time it takes to write all that
> out, and link it together, then I'd have no problem with always emitting
> everything.  As it is, we seem to have at least two modes of
> not-emitting-everything, and the balance point of
> things-most-likely-to-be-useful doesn't really seem to warrant including
> "anything that the standard-library implementor decided to implement with a
> 'using' declaration."
>
>
>
> (Hmmm… could we suppress this stuff just for standard-header
> declarations?  That's where our immediate pain-point seems to be.)
>
> --paulr
>
>
>
> *From:* David Blaikie [mailto:dblaikie at gmail.com]
> *Sent:* Monday, May 04, 2015 10:56 AM
> *To:* Frédéric Riss
> *Cc:* Robinson, Paul; cfe-dev at cs.uiuc.edu Developers (cfe-dev at cs.uiuc.edu)
> *Subject:* Re: [cfe-dev] r222220 causes real debug-info bloat
>
>
>
>
>
>
>
> On Mon, May 4, 2015 at 10:39 AM, Frédéric Riss <friss at apple.com> wrote:
>
>
>
>  On May 4, 2015, at 9:51 AM, Robinson, Paul <
> Paul_Robinson at playstation.sony.com> wrote:
>
>
>
> -----Original Message-----
> From: Frédéric Riss [mailto:friss at apple.com <friss at apple.com>]
> Sent: Friday, May 01, 2015 6:34 PM
> To: Robinson, Paul
> Cc: cfe-dev at cs.uiuc.edu Developers (cfe-dev at cs.uiuc.edu)
> Subject: Re: r222220 causes real debug-info bloat
>
> Hi!
>
>
>  On May 1, 2015, at 5:29 PM, Robinson, Paul
>
> <Paul_Robinson at playstation.sony.com> wrote:
>
>
> We were doing some size analysis and noticed some ridiculous numbers
> related to debug-info size.  Investigation showed that essentially all
> of the bloat came from DW_TAG_imported_declaration pointing to
> DW_TAG_subprogram and the associated DW_TAG_formal_parameter DIEs.
> We tracked this to r222220, which basically caused every 'using' decl
> of a function or variable to have a forward declaration emitted to the
> DWARF, whether or not that 'using' decl itself was used in the CU.
>
> #include <stdlib.h>
> using ::abort
>
> In Clang 3.5, this produces a pretty minimal .debug_info section (just
> the DW_TAG_compile_unit).
> In Clang 3.6, we see an additional DW_TAG_subprogram for abort() and
>
> then
>
>  a DW_TAG_imported_declaration pointing to that declaration.
>
> #include <cstdlib>
>
> on Linux, Clang 3.5 wrote a .debug_info of 185 bytes, 3.6 was 1458.
>
> Multiply this by more headers and again by hundreds to thousands
> of modules and pretty soon you're talking multiple megabytes.
> Getting away from the benchmarks, a real game saw .debug_info increase
> by 13% (6 MB).
>
> r222220 basically causes a 'using' declaration of a function or global
> variable to conjure up a forward declaration, if we haven't already
> seen a declaration or definition.  The commentary talks about how this
> will be RAUW'd later on.  But I'm not sure what motivated this in the
> first place, and it clearly can have a huge adverse effect.
>
>
> The whole story is that I was working on getting debug info emitted
> for function argument default values (which I haven’t gotten back to
> yet BTW), and that my implementation didn’t work if the default value
> was a call to a forward declared function. Our decl tracking didn’t
> handle forward declarations at all, and David pointed out that this
> was why we were also missing some DW_TAG_imported_declaration. I
> then implemented support for forward declarations and tested it using
> the the only current user that cared about forward decls, that is the
> imported_declaration stuff.
>
>
>  I don't mind having a DW_TAG_imported_declaration for something that
> actually gets used in the CU, but a 'using' declaration all by itself
> should not count as "used" for purposes of emitting debug info.
>
>
> It’s not that the using clause counts as a ‘use’, it’s just a
> question of source fidelity.
>
>
> Source fidelity is not about emitting every declaration you see.
> It's about, *if* you're going emit something, do it in a way that is
> faithful to the source-as-written.
>
>
>
> and I’d add “gives means to the debugger to evaluate every source
> expression
>
> as it is written in the source.”
>
>
>
>   Your above example isn’t really
> compelling. By changing it a little bit to:
>
> #include <stdlib.h>
>
> namespace A {
> using ::abort;
> }
>
> The goal of the imported_declaration information is to inform
> the debugger that in this CU, A::abort is the same thing as
> ::abort. It’s just a matter of describing aliased name to
> the debugger so that it can correctly evaluate source
> expressions.
>
>
> Consider this:
>
> void abort();
> namespace A {
> #if USING
>  using ::abort();
> #else
>  void abort();
> #endif
> };
>
> In the not-USING case, Clang emits nothing but the CU DIE, because
> neither abort() declaration is used.
> In the USING case, we see the imported_declaration and the associated
> subprogram.  In both cases, the set of declared names is the same, and
> there are no *actual* uses of either name.
>
>
>
> I’m repeating myself, but this is not about uses, just about describing
> names.
>
>
>
> Then, as a compiler policy, you might want to limit the names you describe
>
> to the ones that are actually used in the program (we have no code to track
>
> the uses and modify the debug info accordingly). You might also want to
>
> emit all the names so that the debugger can evaluate accurately every
>
> expression that could happen in the source code.
>
>
>
> I’m not arguing that one of the above is better than the other (the answer
> can
>
> certainly be different depending on the environment), I mostly want to
> point
>
> out that this information isn’t as useless as you seem to think.
>
>
>
>  Therefore, I argue, this is not about source fidelity but about
> declining to produce declarations not useful to the consumer.
>
>
>
> David would need to confirm, but I think that if we revert the change,
> there
>
> are tests in the GDB test suite that will fail.
>
>
> I don't recall precisely - but yes, I think we un-XFAIL'd some tests after
> you made this change. Could check.
>
>
>   ‘Not useful’ information should not
>
> be able to break user level tests, should it?
>
>
>
>
>
>  Can somebody describe how these extra forward declarations fit into
> the Grand Scheme of Things in a beneficial way, and can we do something
> about unused 'using' declarations?
>
>
> I totally get your point about the size, and according to past
> conversations, I gather that the use described above isn’t maybe
> relevant to your debugger (which maybe points to something that
> can be tuned depending on the target debugger? I’m sorry, but I
> just came back from a long leave and I’m so much behind on list
> reading that I have no idea of the status of that idea).
>
> IMO, it has nothing to do with the fact that the function/variable
> is used or not. The using directives create new names and the only
> way for the debugger(s) to understand these names is to have them
> described in the debug info.
>
>
> By that argument you should emit every name you see in every header,
> whether it is used or not.  That's not what we do, because it's not
> useful to anyone and unnecessarily bloats the debug info.  The case of
> used-only-by-'using' is no different because there's no *actual* use.
>
>
>
> I’m sorry, but the fact that all declarations aren’t emitted happen to
> bother
>
> me from time to time. I’m a heavy user of debugger conditional breakpoints,
>
> and the conditions often involve calling to functions that aren’t defined
> in my
>
> program, but which were described in the headers (for example libc).
>
>
>
> Not having the prototype for these functions available to the debugger
> requires
>
> me to play casting games so that it gets the calling convention correctly.
> If all
>
> the declarations were to be emitted I wouldn’t have that issue and my debug
>
> experience would be better.
>
>
>
> Do not get me wrong. I’m not arguing for including all the declarations.
> I’m just
>
> trying to point out the the information isn’t useless as you describe it
> and that
>
> there is a balance to find. Including only the names that have been really
> used
>
> in the program would be a perfectly sensible one, but we do not have the
> code
>
> that does that tracking!
>
>
>
>   I found it instructive to add this to my not-USING example:
>
> void foo() { ::abort(); A::abort(); }
>
> which naively I would expect to induce subprogram DIEs for abort() and
> A::abort(), but in fact it doesn't, even with -fstandalone-debug.  That
> seems sub-optimal too.  But, it just further illustrates the discrepancy
> between the 'using' declarations and non-'using' declarations.
>
> Also that there's a deeper problem here, which might or might not be
> what David Blaikie was getting at.
>
> The missing DIEs in the non-USING case, along with memories of trying
> to do something else with used/non-used declarations some while ago,
> make me think that even though abort() and A::abort() are (probably)
> being flagged, debug-info generation isn't going back through those
> non-defining declarations to see which ones ought to be emitted after
> all.
>
>
>
> As pointed above, such code that goes from the uses to the debug info just
>
> doesn’t exist. The debug info for types and declarations is generated
> during
>
> AST construction (IIRC) and not touched afterwards.
>
>
>
> Fred
>
>
>
>   It looks like CGDebugInfo::finalize() does a post-pass for types, to
> some extent; maybe that needs to be done for other decls as well?
> --paulr
>
>
>
>
>  Given how the patch works, it looks we can just short-circuit the
> creation of these forward declarations with no harm done, but I have to
> wonder whether we're shooting ourselves in the foot in some situation
> that isn't immediately obvious.
>
>
> If the git commit message is still accurate regarding the use of that
> function, then you’ll just go back to the previous state which you
> liked better. If the function grows new callers, you might lose
> more stuff, but IIUC it should mostly be stuff that you don’t care
> about anyway.
>
> Fred
>
>
>
>  Thanks,
> --paulr
>
>
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20150504/5be5e1ae/attachment.html>


More information about the cfe-dev mailing list