[cfe-dev] r222220 causes real debug-info bloat

Nico Weber thakis at chromium.org
Mon May 4 13:59:34 PDT 2015


On Mon, May 4, 2015 at 1:54 PM, David Blaikie <dblaikie at gmail.com> wrote:

> Rather than replying point by point to various subjective descriptions of
> the state of things, perhaps I can sum up:
>
> The fix is to teach Clang to track the use of using declarations (as I
> mentioned, there are complications with that) and then only emit those
> using declarations that are used. It's not perfect, but it's a reasonable
> thing to do.
>
> If you'd like this fixed, I'm sure Fred & I can help point you in the
> right direction for where to get started on this work - but as I said, it's
> not simple/easy/trivial due to the complexity of what it means to "use" a
> using declaration. (see my previous email for some examples of wrinkles)
>

(We already track some of that for -Wunused-local-typedef which also warns
on unused "using A = foo;" decls. It doesn't track namespace uses, but the
patch that added Wunused-local-typedef is probably a good starting point to
see how this use tracking might work.)


>
>
> - David
>
> On Mon, May 4, 2015 at 12:46 PM, Robinson, Paul <
> Paul_Robinson at playstation.sony.com> wrote:
>
>>  From Fred:
>>
>> Source fidelity is not about emitting every declaration you see.
>> It's about, *if* you're going emit something, do it in a way that is
>> faithful to the source-as-written.
>>
>>
>>
>> and I’d add “gives means to the debugger to evaluate every source
>> expression
>>
>> as it is written in the source.”
>>
>>
>>
>> Agreed.  But a 'using' declaration is not an expression.  If the declared
>> name is not used in any source expression, it can hardly be needed to
>> evaluate a source expression as written in the source.
>>
>>
>>
>> I admit calling this stuff "useless" is a bit of hyperbole.  Clang is
>> being inconsistent.  If a normal unqualified function declaration isn't
>> emitted, and a normal function declaration inside a namespace isn't
>> emitted, I see no argument justifying emitting a function that happens to
>> be declared with a 'using' declaration. A 'using' versus a normal
>> declaration is typically an implementation detail, not something inherently
>> significant that can change the meaning of a program.
>>
>>
>>
>> The whole story is that I was working on getting debug info emitted
>> for function argument default values (which I haven’t gotten back to
>> yet BTW), and that my implementation didn’t work if the default value
>> was a call to a forward declared function. Our decl tracking didn’t
>> handle forward declarations at all, and David pointed out that this
>> was why we were also missing some DW_TAG_imported_declaration. I
>> then implemented support for forward declarations and tested it using
>> the the only current user that cared about forward decls, that is the
>> imported_declaration stuff.
>>
>>  But it looks to me like these "missing" imported_declarations aren't
>> any more missing than the equivalent non-'using' declarations.  Is there a
>> principled justification for the inconsistency?  I'm not hearing one.  What
>> I'm hearing is the Clang can't keep track of what's actually needed in the
>> source, so we arbitrarily emit some things and not others when we're not
>> sure.  That's really unsatisfactory.
>>
>> Now, if the forward declaration bit is something that was part of a
>> larger project that is half-implemented, and the imported_declaration bit
>> was a momentary convenience that you intended to take out later, well
>> that's fine so long as you now intend to finish the job.
>>
>>
>>
>> I am sympathetic to the desire to have the full power of libc available
>> in conditional breakpoints, and so forth.  If that's your preferred mode
>> then perhaps a more –gfull-to-bursting mode might be the ticket.  If it
>> didn't cost multi-megabytes to get there, and the time it takes to write
>> all that out, and link it together, then I'd have no problem with always
>> emitting everything.  As it is, we seem to have at least two modes of
>> not-emitting-everything, and the balance point of
>> things-most-likely-to-be-useful doesn't really seem to warrant including
>> "anything that the standard-library implementor decided to implement with a
>> 'using' declaration."
>>
>>
>>
>> (Hmmm… could we suppress this stuff just for standard-header
>> declarations?  That's where our immediate pain-point seems to be.)
>>
>> --paulr
>>
>>
>>
>> *From:* David Blaikie [mailto:dblaikie at gmail.com]
>> *Sent:* Monday, May 04, 2015 10:56 AM
>> *To:* Frédéric Riss
>> *Cc:* Robinson, Paul; cfe-dev at cs.uiuc.edu Developers (cfe-dev at cs.uiuc.edu
>> )
>> *Subject:* Re: [cfe-dev] r222220 causes real debug-info bloat
>>
>>
>>
>>
>>
>>
>>
>> On Mon, May 4, 2015 at 10:39 AM, Frédéric Riss <friss at apple.com> wrote:
>>
>>
>>
>>  On May 4, 2015, at 9:51 AM, Robinson, Paul <
>> Paul_Robinson at playstation.sony.com> wrote:
>>
>>
>>
>> -----Original Message-----
>> From: Frédéric Riss [mailto:friss at apple.com <friss at apple.com>]
>> Sent: Friday, May 01, 2015 6:34 PM
>> To: Robinson, Paul
>> Cc: cfe-dev at cs.uiuc.edu Developers (cfe-dev at cs.uiuc.edu)
>> Subject: Re: r222220 causes real debug-info bloat
>>
>> Hi!
>>
>>
>>  On May 1, 2015, at 5:29 PM, Robinson, Paul
>>
>> <Paul_Robinson at playstation.sony.com> wrote:
>>
>>
>> We were doing some size analysis and noticed some ridiculous numbers
>> related to debug-info size.  Investigation showed that essentially all
>> of the bloat came from DW_TAG_imported_declaration pointing to
>> DW_TAG_subprogram and the associated DW_TAG_formal_parameter DIEs.
>> We tracked this to r222220, which basically caused every 'using' decl
>> of a function or variable to have a forward declaration emitted to the
>> DWARF, whether or not that 'using' decl itself was used in the CU.
>>
>> #include <stdlib.h>
>> using ::abort
>>
>> In Clang 3.5, this produces a pretty minimal .debug_info section (just
>> the DW_TAG_compile_unit).
>> In Clang 3.6, we see an additional DW_TAG_subprogram for abort() and
>>
>> then
>>
>>  a DW_TAG_imported_declaration pointing to that declaration.
>>
>> #include <cstdlib>
>>
>> on Linux, Clang 3.5 wrote a .debug_info of 185 bytes, 3.6 was 1458.
>>
>> Multiply this by more headers and again by hundreds to thousands
>> of modules and pretty soon you're talking multiple megabytes.
>> Getting away from the benchmarks, a real game saw .debug_info increase
>> by 13% (6 MB).
>>
>> r222220 basically causes a 'using' declaration of a function or global
>> variable to conjure up a forward declaration, if we haven't already
>> seen a declaration or definition.  The commentary talks about how this
>> will be RAUW'd later on.  But I'm not sure what motivated this in the
>> first place, and it clearly can have a huge adverse effect.
>>
>>
>> The whole story is that I was working on getting debug info emitted
>> for function argument default values (which I haven’t gotten back to
>> yet BTW), and that my implementation didn’t work if the default value
>> was a call to a forward declared function. Our decl tracking didn’t
>> handle forward declarations at all, and David pointed out that this
>> was why we were also missing some DW_TAG_imported_declaration. I
>> then implemented support for forward declarations and tested it using
>> the the only current user that cared about forward decls, that is the
>> imported_declaration stuff.
>>
>>
>>  I don't mind having a DW_TAG_imported_declaration for something that
>> actually gets used in the CU, but a 'using' declaration all by itself
>> should not count as "used" for purposes of emitting debug info.
>>
>>
>> It’s not that the using clause counts as a ‘use’, it’s just a
>> question of source fidelity.
>>
>>
>> Source fidelity is not about emitting every declaration you see.
>> It's about, *if* you're going emit something, do it in a way that is
>> faithful to the source-as-written.
>>
>>
>>
>> and I’d add “gives means to the debugger to evaluate every source
>> expression
>>
>> as it is written in the source.”
>>
>>
>>
>>   Your above example isn’t really
>> compelling. By changing it a little bit to:
>>
>> #include <stdlib.h>
>>
>> namespace A {
>> using ::abort;
>> }
>>
>> The goal of the imported_declaration information is to inform
>> the debugger that in this CU, A::abort is the same thing as
>> ::abort. It’s just a matter of describing aliased name to
>> the debugger so that it can correctly evaluate source
>> expressions.
>>
>>
>> Consider this:
>>
>> void abort();
>> namespace A {
>> #if USING
>>  using ::abort();
>> #else
>>  void abort();
>> #endif
>> };
>>
>> In the not-USING case, Clang emits nothing but the CU DIE, because
>> neither abort() declaration is used.
>> In the USING case, we see the imported_declaration and the associated
>> subprogram.  In both cases, the set of declared names is the same, and
>> there are no *actual* uses of either name.
>>
>>
>>
>> I’m repeating myself, but this is not about uses, just about describing
>> names.
>>
>>
>>
>> Then, as a compiler policy, you might want to limit the names you describe
>>
>> to the ones that are actually used in the program (we have no code to
>> track
>>
>> the uses and modify the debug info accordingly). You might also want to
>>
>> emit all the names so that the debugger can evaluate accurately every
>>
>> expression that could happen in the source code.
>>
>>
>>
>> I’m not arguing that one of the above is better than the other (the
>> answer can
>>
>> certainly be different depending on the environment), I mostly want to
>> point
>>
>> out that this information isn’t as useless as you seem to think.
>>
>>
>>
>>  Therefore, I argue, this is not about source fidelity but about
>> declining to produce declarations not useful to the consumer.
>>
>>
>>
>> David would need to confirm, but I think that if we revert the change,
>> there
>>
>> are tests in the GDB test suite that will fail.
>>
>>
>> I don't recall precisely - but yes, I think we un-XFAIL'd some tests
>> after you made this change. Could check.
>>
>>
>>   ‘Not useful’ information should not
>>
>> be able to break user level tests, should it?
>>
>>
>>
>>
>>
>>  Can somebody describe how these extra forward declarations fit into
>> the Grand Scheme of Things in a beneficial way, and can we do something
>> about unused 'using' declarations?
>>
>>
>> I totally get your point about the size, and according to past
>> conversations, I gather that the use described above isn’t maybe
>> relevant to your debugger (which maybe points to something that
>> can be tuned depending on the target debugger? I’m sorry, but I
>> just came back from a long leave and I’m so much behind on list
>> reading that I have no idea of the status of that idea).
>>
>> IMO, it has nothing to do with the fact that the function/variable
>> is used or not. The using directives create new names and the only
>> way for the debugger(s) to understand these names is to have them
>> described in the debug info.
>>
>>
>> By that argument you should emit every name you see in every header,
>> whether it is used or not.  That's not what we do, because it's not
>> useful to anyone and unnecessarily bloats the debug info.  The case of
>> used-only-by-'using' is no different because there's no *actual* use.
>>
>>
>>
>> I’m sorry, but the fact that all declarations aren’t emitted happen to
>> bother
>>
>> me from time to time. I’m a heavy user of debugger conditional
>> breakpoints,
>>
>> and the conditions often involve calling to functions that aren’t defined
>> in my
>>
>> program, but which were described in the headers (for example libc).
>>
>>
>>
>> Not having the prototype for these functions available to the debugger
>> requires
>>
>> me to play casting games so that it gets the calling convention
>> correctly. If all
>>
>> the declarations were to be emitted I wouldn’t have that issue and my
>> debug
>>
>> experience would be better.
>>
>>
>>
>> Do not get me wrong. I’m not arguing for including all the declarations.
>> I’m just
>>
>> trying to point out the the information isn’t useless as you describe it
>> and that
>>
>> there is a balance to find. Including only the names that have been
>> really used
>>
>> in the program would be a perfectly sensible one, but we do not have the
>> code
>>
>> that does that tracking!
>>
>>
>>
>>   I found it instructive to add this to my not-USING example:
>>
>> void foo() { ::abort(); A::abort(); }
>>
>> which naively I would expect to induce subprogram DIEs for abort() and
>> A::abort(), but in fact it doesn't, even with -fstandalone-debug.  That
>> seems sub-optimal too.  But, it just further illustrates the discrepancy
>> between the 'using' declarations and non-'using' declarations.
>>
>> Also that there's a deeper problem here, which might or might not be
>> what David Blaikie was getting at.
>>
>> The missing DIEs in the non-USING case, along with memories of trying
>> to do something else with used/non-used declarations some while ago,
>> make me think that even though abort() and A::abort() are (probably)
>> being flagged, debug-info generation isn't going back through those
>> non-defining declarations to see which ones ought to be emitted after
>> all.
>>
>>
>>
>> As pointed above, such code that goes from the uses to the debug info just
>>
>> doesn’t exist. The debug info for types and declarations is generated
>> during
>>
>> AST construction (IIRC) and not touched afterwards.
>>
>>
>>
>> Fred
>>
>>
>>
>>   It looks like CGDebugInfo::finalize() does a post-pass for types, to
>> some extent; maybe that needs to be done for other decls as well?
>> --paulr
>>
>>
>>
>>
>>  Given how the patch works, it looks we can just short-circuit the
>> creation of these forward declarations with no harm done, but I have to
>> wonder whether we're shooting ourselves in the foot in some situation
>> that isn't immediately obvious.
>>
>>
>> If the git commit message is still accurate regarding the use of that
>> function, then you’ll just go back to the previous state which you
>> liked better. If the function grows new callers, you might lose
>> more stuff, but IIUC it should mostly be stuff that you don’t care
>> about anyway.
>>
>> Fred
>>
>>
>>
>>  Thanks,
>> --paulr
>>
>>
>>
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>
>>
>>
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20150504/bd2d0c0f/attachment.html>


More information about the cfe-dev mailing list