[llvm] r191792 - Debug Info: remove duplication of DIEs when a DIE is part of the type system

Thu Oct 3 16:24:01 PDT 2013

On Thu, Oct 3, 2013 at 4:10 PM, Manman Ren <manman.ren at gmail.com> wrote:

>
>
>
> On Thu, Oct 3, 2013 at 2:06 PM, David Blaikie <dblaikie at gmail.com> wrote:
>
>>
>>  3GB memory reduction for xalan built with lto -g (without removing DIE
>>>> duplication, the memory usage is 7GB)
>>>>
>>>
>>> OK, so you're looking at memory usage during LTO, not the size of the
>>> resulting debug info? (that's fine, just trying to understand exactly what
>>> metrics you're targeting)
>>>
>>
>> One thought I had (and just discussed with Eric - he seemed to think it
>> was plausible, at least):
>>
>> If you're interested in reducing peak memory usage of LLVM during LTO,
>> have you considered modifying debug info generation not to cache all the
>> compile_units before emitting them? If we emitted one CU at a time as they
>> were created I imagine the memory footprint would be much lower.
>>
>
> Yes, that is on my todo list after type uniquing is done.
>

So, as I said, I think it might be better to do that work first. It may
have a substantial impact on cross-CU references. (see below)

> Right now, we don't release any memory and we have layers of memory usage
> related to debug info: MDNodes, DIEs and some data in MC layer.
>  All these layers exist for the whole program with LTO.
>
>
>> (and we could still, potentially, add the type cross-referencing by just
>> caching the section offsets or labels, etc, of the types emitted in prior
>> units - though that wouldn't be perfect for types that contain some members
>> (implicit special members) in one CU and a different set of members in
>> another CU (and it'll be a bit trickier for type units - we'd need to cache
>> any subgraph of potentially self-referencing type units until all the
>> signatures are resolved, then they could all be emitted))
>>
>> This isn't necessarily a place you should start with - I can appreciate
>> that your current approach is probably higher value to you for lower
>> engineering cost (and thus gets you closer sooner) but if it's still not
>> going to be sufficient for your needs, especially if you're going to have
>> to implement something like emit-CU-at-a-time as I describe above anyway,
>> it might be more valuable to lay that foundation first.
>>
>
> Even if we emit one CU at a time, it is still good to share the types
> across-CU so we don't waist time constructing the type DIEs for each CU,
> and this approach (remove DIE duplication) also reduces the raw DWARF size.
>

I agree, from an output size perspective we'll still want to share DIEs
across CUs - but I think you might see bigger memory footprint (since
that's the stat you're aiming at at the moment) wins from this change, but
more importantly - I think a change like that will have a significant
impact on a feature like the cross-CU DIE referencing you're doing here.

If we emit one CU at a time, cross-CU DIE references will look very
different - it won't be simply a matter of getting a DIE from the cross-CU
map anymore. Likely we'll want to store a much more minimal form of the DIE
(just it's section offset or label) in a table once the prior CU has been
emitted. Then when a future CU is emitted and wants to reference that, we
can lookup that remnant in the table - the whole DIE won't exist anymore.

It seems like the bigger change should go first so we don't avoid
discussing a lot of this design only to have to rework it substantially
when we emit a CU-at-a-time.

>
>
>>
>> If not, I'd still probably like to discuss the design of the cross-CU
>> referencing work you have here to decide on the right design in a normal
>> patch review process rather than trying to think about how the current code
>> can be morphed into a better design.
>>
>
> What other designs do you have in mind?
>

Just some of the stuff we've already been discussing such as merging the
maps and simplifying the DwarfDebug entry points. But if you're already
planning to do CU-at-a-time, I wonder whether it's worth discussing the
design of cross-CU references before we do CU-at-a-time - it seems to me
like the implementation would change quite significantly.

>
> Manman
>
>>
>>
>> - David
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20131003/b2417c18/attachment.html>