[LLVMdev] [RFC] Less memory and greater maintainability for debug info IR

Sean Silva chisophugis at gmail.com
Thu Oct 16 22:09:29 PDT 2014


On Tue, Oct 14, 2014 at 11:40 AM, Duncan P. N. Exon Smith <
dexonsmith at apple.com> wrote:

>
> > On Oct 13, 2014, at 6:59 PM, Sean Silva <chisophugis at gmail.com> wrote:
> >
> > Stupid question, but when I was working on LTO last Summer the primary
> culprit for excessive memory use was due to us not being smart when linking
> the IR together (Espindola would know more details). Do we still have that
> problem? For starters, how does the memory usage of just llvm-link compare
> to the memory usage of the actual LTO run? If the issue I was seeing last
> Summer is still there, you should see that the invocation of llvm-link is
> actually the most memory-intensive part of the LTO step, by far.
>
> To be clear, I'm running the command-line:
>
>     $ llvm-lto -exported-symbol _main llvm-lto.lto.bc
>
> Since this is a pre-linked bitcode file, we shouldn't be wasting much
> memory from the linking stage.
>
> Running ld64 directly gives a peak memory footprint of ~30GB for the
> full link, so there's something else going on there that I'll be
> digging into later.
>

Dig into this first!

In the OP you are talking about essentially a pure "optimization" (in the
programmer-wisdom "beware of it" sense), to "save" 2GB of peak memory. But
from your analysis it's not clear that this 2GB savings actually is
reflected as peak memory usage saving (since the ~30GB peak might be
happening elsewhere in the LTO process). It is this ~30GB peak, and not the
one you originally analyzed, which your customers presumably care about.

Be realistic: ~15GB (equivalent to *all the memory you tracked combined*)
is flying under the radar of your analysis. It would be very surprising if
this has not led to multiple erroneous conclusions about what direction to
take things in your plan.

To recap, the analysis you performed seems to support neither of the
following conclusions:
- Peak memory usage during LTO would be improved by this plan
- Build time for LTO would be improved by this plan (from what you have
posted, you didn't measure time at all)

I hesitate to consider such an "aggressive plan" (as you describe it) with
neither of the above two legs to stand on.

Of course, this is all tangential to the discussion of e.g. a more
readable/writable .ll form for debug info, or debug info compatibility.
However, it seems like you jumped into this from the point of view of it
being an optimization, rather than a maintainability/compatibility thing.

-- Sean Silva


>
> > 2GB (out of 15.3GB i.e. ~13%) seems pretty pathetic savings when we have
> a single pie slice near 40% of the # of Value's allocated and another at
> 21%. Especially this being "step 4".
>
> 15.3GB is the peak memory of `llvm-lto`.  This comes late in the
> process, after DIEs have been created.  I haven't looked in detail past
> debug info metadata, but here's a sketch of what I imagine is in memory
> at this point.
>
>   - The IR, including uniquing side-tables.
>   - Optimization and backend passes.
>   - Parts of SelectionDAG that haven't been freed.
>   - `MachineFunction`s and everything inside them.
>   - Whatever state the `AsmPrinter`, etc., need.
>
> I expect to look at a couple of other debug-info-related memory usage
> areas once I've shrunk the metadata:
>
>   - What's the total footprint of DIEs?  This run has 4M of them, whose
>     allocated footprint is ~1GB.  I'm hoping that a deeper look will
>     reveal an even larger attack surface.
>
>   - How much do debug info intrinsics cost?  They show up in at least
>     three forms -- IR-level, SDNodes, and MachineInstrs -- and there
>     can be a lot of them.  How many?  What's their footprint?
>
> For now, I'm focusing on the problem I've already identified.
>
> > You need more data. Right now you have essentially one data point,
>
> I looked at a number of internal C and C++ programs with -flto -g, and
> dug deeply into llvm-lto.lto.bc because it's small enough that it's easy
> to analyze (and its runtime profile was representative of the other C++
> programs I was looking at).
>
> I didn't look deeply at a broad spectrum, but memory usage and runtime
> for building clang with -flto -g is something we care a fair bit about.
>
> > and it's not even clear what you measured really. If your goal is saving
> memory, I would expect at least a pie chart that breaks down LLVM's memory
> usage (not just # of allocations of different sorts; an approximation is
> fine, as long as you explain how you arrived at it and in what sense it
> approximates the true number).
>
> I'm not sure there's value in diving deeply into everything at once.
> I've identified one of the bottlenecks, so I'd like to improve it before
> digging into the others.
>
> Here's some visibility into where my numbers come from.
>
> I got the 15.3GB from a profile of memory usage vs. time.  Peak usage
> comes late in the process, around when DIEs are being dealt with.
>
> Metadata node counts stabilize much earlier in the process.  The rest of
> the numbers are based on counting `MDNodes` and their respective
> `MDNodeOperands`, and multiplying by the cost of their operands.  Here's
> a dump from around the peak metadata node count:
>
>     LineTables = 7500000[30000000], InlinedLineTables = 6756182,
> Directives = 7611669[42389128], Arrays = 570609[577447], Others =
> 1176556[5133065]
>     Tag =   256, Count =   554992, Ops =   2531428, Name =
> DW_TAG_auto_variable
>     Tag = 16647, Count =      988, Ops =      4940, Name =
> DW_TAG_GNU_template_parameter_pack
>     Tag =    52, Count =     9933, Ops =     59598, Name = DW_TAG_variable
>     Tag =    33, Count =      190, Ops =       190, Name =
> DW_TAG_subrange_type
>     Tag =    59, Count =        1, Ops =         3, Name =
> DW_TAG_unspecified_type
>     Tag =    40, Count =    24731, Ops =     24731, Name =
> DW_TAG_enumerator
>     Tag =    21, Count =   354166, Ops =   2833328, Name =
> DW_TAG_subroutine_type
>     Tag =     2, Count =    77999, Ops =    623992, Name =
> DW_TAG_class_type
>     Tag =    47, Count =    27122, Ops =    108488, Name =
> DW_TAG_template_type_parameter
>     Tag =    28, Count =     8491, Ops =     33964, Name =
> DW_TAG_inheritance
>     Tag =    66, Count =    10930, Ops =     43720, Name =
> DW_TAG_rvalue_reference_type
>     Tag =    16, Count =    54680, Ops =    218720, Name =
> DW_TAG_reference_type
>     Tag =    23, Count =      624, Ops =      4992, Name =
> DW_TAG_union_type
>     Tag =     4, Count =     5344, Ops =     42752, Name =
> DW_TAG_enumeration_type
>     Tag =    11, Count =   360390, Ops =   1081170, Name =
> DW_TAG_lexical_block
>     Tag =   258, Count =        1, Ops =         1, Name =
> DW_TAG_expression
>     Tag =    13, Count =    73880, Ops =    299110, Name = DW_TAG_member
>     Tag =    58, Count =     1387, Ops =      4161, Name =
> DW_TAG_imported_module
>     Tag =     1, Count =     2747, Ops =     21976, Name =
> DW_TAG_array_type
>     Tag =    46, Count =  1341021, Ops =  12069189, Name =
> DW_TAG_subprogram
>     Tag =   257, Count =  4373879, Ops =  20785065, Name =
> DW_TAG_arg_variable
>     Tag =     8, Count =     2246, Ops =      6738, Name =
> DW_TAG_imported_declaration
>     Tag =    53, Count =       57, Ops =       228, Name =
> DW_TAG_volatile_type
>     Tag =    15, Count =    55163, Ops =    220652, Name =
> DW_TAG_pointer_type
>     Tag =    41, Count =     3382, Ops =      6764, Name = DW_TAG_file_type
>     Tag =    22, Count =   158479, Ops =    633916, Name = DW_TAG_typedef
>     Tag =    48, Count =      486, Ops =      2430, Name =
> DW_TAG_template_value_parameter
>     Tag =    36, Count =       15, Ops =        45, Name = DW_TAG_base_type
>     Tag =    17, Count =     1164, Ops =      8148, Name =
> DW_TAG_compile_unit
>     Tag =    31, Count =       19, Ops =        95, Name =
> DW_TAG_ptr_to_member_type
>     Tag =    57, Count =     2034, Ops =      6102, Name = DW_TAG_namespace
>     Tag =    38, Count =    32133, Ops =    128532, Name =
> DW_TAG_const_type
>     Tag =    19, Count =    72995, Ops =    583960, Name =
> DW_TAG_structure_type
>
> (Note: the InlinedLineTables stat is included in LineTables stat.)
>
> You can determine the rough memory footprint of each type of node by
> multiplying the "Count" by `sizeof(MDNode)` (x86-64: 56B) and the "Ops"
> by `sizeof(MDNodeOperand)` (x86-64: 32B).
>
> Overall, there are 7.5M linetables with 30M operands, so by this method
> their footprint is ~1.3GB.  There are 7.6M descriptors with 42.4M
> operands, so their footprint is ~1.7GB.
>
> I dumped another stat periodically to tell me the peak size of the
> side-tables for line table entries, which are split into "Scopes" (for
> non-inlined) and "Inlined" (these counts are disjoint, unlike the
> previous stats):
>
>     Scopes = 203166 [203166], Inlined = 3500000 [3500000]
>
> I assumed that both `DenseMap` and `std::vector` over-allocate by 50%
> to estimate the current (and planned) costs for the side-tables.
>
> Another stat I dumped periodically was the breakdown between V(alues),
> U(sers), C(onstants), M(etadata nodes), and (metadata) S(trings).
> Here's a sample from nearby:
>
>     V = 23967800 (40200000 - 16232200)
>     U =  5850877 ( 7365503 -  1514626)
>     C =   205491 (  279134 -    73643)
>     M = 16837368 (31009291 - 14171923)
>     S =   693869 (  693869 -        0)
>
> Lastly, I dumped a breakdown of the types of MDNodeOperands.  This is
> also a sample from nearby:
>
>     MDOps = 77644750 (100%)
>     Const = 14947077 ( 19%)
>     Node  = 41749475 ( 53%)
>     Str   =  9553581 ( 12%)
>     Null  = 10976693 ( 14%)
>     Other =   417924 (  0%)
>
> While I didn't use this breakdown for my memory estimates, it was
> interesting nevertheless.  Note the following:
>
>   - The number of constants is just under 15M.  This dump came less than
>     a second before the dump above, where we have 7.5M line table
>     entries.  Line table entries have 2 operands of `ConstantInt`.  This
>     lines up nicely.
>
>     Note: this checked `isa<Constant>(Op) && !isa<GlobalValue>(Op)`.
>
>   - There are a lot of null operands.  By making subclasses for the
>     various types of debug info IR, we can probably shed some of these
>     altogether.
>
>   - There are few "Other" operands.  These are likely all `GlobalValue`
>     references, and are the only operands that need to be referenced
>     using value handles.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141016/d2b8face/attachment.html>


More information about the llvm-dev mailing list