[LLVMbugs] [Bug 17891] New: Extend mdnode/string to significantly reduce heap size of debug metadata

bugzilla-daemon at llvm.org bugzilla-daemon at llvm.org
Mon Nov 11 23:09:18 PST 2013


http://llvm.org/bugs/show_bug.cgi?id=17891

            Bug ID: 17891
           Summary: Extend mdnode/string to significantly reduce heap size
                    of debug metadata
           Product: libraries
           Version: 1.0
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P
         Component: Core LLVM classes
          Assignee: unassignedbugs at nondot.org
          Reporter: clattner at apple.com
                CC: llvmbugs at cs.uiuc.edu
    Classification: Unclassified

This came out of a devmtg'13 debug info bof discussion.  If I had to take a
WAG, I'd guess that an approach like this could cut our heap usage of debug
info down by a constant factor of 4x, perhaps a lot more. 

Consider:

$ cat t.c
struct foo {
  int x, y, z;
} g;

void f() {
  g.x = 1;
}
$ clang t.c -g -emit-llvm -S -o -

The output of this command defines 20 MDNodes and numerous MDStrings.  MDString
is a pretty memory efficient datatype, but MDNode is not: each one is a
FoldingSetNode (and folding set isn't particularly efficient in time or space)
and much worse, each operand of an MDNode is a CallbackVH - an extremely heavy
type in both time and space.  To give you an idea of how expensive it is:
CallbackVH itself is 32-bytes (assuming the compiler is built 64-bit) and each
CallbackVH requires hash table lookups to ensure they are associated with the
underlying value.  This bloats time and space and is generally an unpleasant
thing.

With a debug info schema change, we can dramatically improve this by reducing
the number of "nodes" and "operands" in the graph, primarily by using strings
more.  For example, we see:

!1 = metadata !{metadata !"t.c", metadata !"/Users/sabre/llvm"}

optimizing this isn't going to be the biggest savings ever, but we can
completely eliminate the !1 MDNode and its two operands by simply using
something like !"/Users/sabre/llvm\00t.c" to encode the two values into a
single string.  The \00 is a nul value that is being used as a field separator.
 Doing this would cut over 100 bytes off the heap.


Moving up the stack one notch, the compile unit is also ridiculously encoded:

!0 = metadata !{i32 786449, metadata !1, i32 12, metadata !"clang version 3.4
(trunk 194454) (llvm/trunk 194395)", i1 false, metadata !"", i32 0, metadata
!2, metadata !2, metadata !3, metadata !8, metadata !2, metadata !""}
!2 = metadata !{i32 0}
!3 = metadata !{metadata !4}
!8 = metadata !{metadata !9}

I'm not sure what is going on here, but indirecting through the !2 and !3
metadata nodes are wasteful, and encoding numbers like "i1 false", "i32 786449"
and "i32 12" as operands are really space efficient (over 32 bytes each!).  It
would be better to encode these as strings.  For example, replacing a bunch of
operands with !"786449,12,false,0,0,0,0" would save a LOT of space.  If the
empty strings are something that have a flat and predictable form, they could
be inlined as well.  


The file info and compile unit is illustrative, but not the bulk of where
memory goes.  The struct definition is the real problem:

; [ DW_TAG_structure_type ] [foo] [line 2, size 96, align 32, offset 0] [def]
[from ]
!10 = metadata !{i32 786451, metadata !1, null, metadata !"foo", i32 2, i64 96,
i64 32, i32 0, i32 0, null, metadata !11, i32 0, null, null, null}
!11 = metadata !{metadata !12, metadata !14, metadata !15}

Hopefully by now you know what I'm going to say: all of those scalar fields
should be collapsed together into a comma (or nul) separated string.  The field
list (!11) should just be listed at the end of the MDNode, inline, there is no
reason to break it out to a separate MDNode.

The fields being referenced are:

; [ DW_TAG_member ] [x] [line 3, size 32, align 32, offset 0] [from int]
!12 = metadata !{i32 786445, metadata !1, metadata !10, metadata !"x", i32 3,
i64 32, i64 32, i64 0, i32 0, metadata !13}

; [ DW_TAG_member ] [y] [line 3, size 32, align 32, offset 32] [from int]
!14 = metadata !{i32 786445, metadata !1, metadata !10, metadata !"y", i32 3,
i64 32, i64 32, i64 32, i32 0, metadata !13}

; [ DW_TAG_member ] [z] [line 3, size 32, align 32, offset 64] [from int]
!15 = metadata !{i32 786445, metadata !1, metadata !10, metadata !"z", i32 3,
i64 32, i64 32, i64 64, i32 0, metadata !13}

Given that these MDNodes will never be reused (they contain file/line
information!) there is no reason to split them out to their own MDNodes, they
can be inlined into the parent node.  Doing so eliminates the need for the
DW_TAG_member field, as well as the parent pointer (the !10 node), leaving just
the !1 pointer (the file, which is a string we want to share) and !13 (the
type, which should be shared).  The rest of the fields can easily collapse down
into a single string. 

This sort of optimization can be applied to pretty much everything in debug
info, and even the profile information.  For example base types could be
collapsed down into a single string:

!13 = metadata !{i32 786468, null, null, metadata !"int", i32 0, i64 32, i64
32, i64 0, i32 0, i32 5} ; [ DW_TAG_base_type ] [int] [line 0, size 32, align
32, offset 0, enc DW_ATE_signed]


Doing this amounts to a complete schema redesign, but the win would be huge.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20131112/d578a46f/attachment.html>


More information about the llvm-bugs mailing list