[LLVMdev] [Debug Info + LTO] Type Uniquing for C types?

Manman Ren manman.ren at gmail.com
Mon Oct 21 18:29:00 PDT 2013


On Mon, Oct 21, 2013 at 2:27 PM, Eric Christopher <echristo at gmail.com>wrote:

> On Mon, Oct 14, 2013 at 4:14 PM, Manman Ren <manman.ren at gmail.com> wrote:
> >
> >
> >
> > On Mon, Oct 14, 2013 at 1:08 PM, Manman Ren <manman.ren at gmail.com>
> wrote:
> >>
> >>
> >>
> >>
> >> On Fri, Oct 11, 2013 at 12:40 PM, Eric Christopher <echristo at gmail.com>
> >> wrote:
> >>>
> >>> >> It depends upon the goals. If the goal is to make debug information
> >>> >> post-link smaller then just using the type hashing machinery for
> >>> >> structs will be sufficient.
> >>> >
> >>> >
> >>> > By "the type hashing machinery for structs", are you referring to the
> >>> > type
> >>> > hashing at the back end?
> >>> >
> >>>
> >>> I am, yes, since there's no other place we do currently.
> >>>
> >>> >>
> >>> >> However, if it's to save space during an
> >>> >> LTO link then we'll want to do it in the front end.
> >>> >
> >>> >
> >>> > Yes, my purpose here is to save memory space in number of MDNodes
> (also
> >>> > # of
> >>> > DIEs) generated in a LTO build.
> >>> > Type hashing at the DIE level can reduce the dwarf size.
> >>> >
> >>>
> >>> I agree with both of these statements.
> >>>
> >>> I also agree with the desire to help LTO memory consumption so we'll
> >>> need something from the front end for this since we'd like to continue
> >>> to use the folding set to do the uniquing.
> >>
> >>
> >> Hi Eric,
> >>
> >> Assume that we need to do type hashing (i.e. assume Doug's rules for
> >> merging C types do not apply),
> >
> >
> > Now the assumption is true, any opinion on where to do the hashing?
> >
>
> We should still do it in the front end for the types with a language
> specific way. Nothing has greatly changed versus, say, C++ here - it's
> just easier in C++ because of the language.
>

Hi Eric,

We don't have any hashing implementation in the front end, so I don't quite
get what you mean
by "still do it in the front end" :)

For C++, we don't need to hash the types because of ODR. For other
languages, is it better
to hash the MDNodes instead of the AST nodes because of the following?
1: we can handle all languages without ODR
2: we don't need to update each front-end that tries to take advantage of
type uniquing
3: the AST contains more information and MDNodes contain all the necessary
information for Dwarf

I would like to propose the following:
Step 1: When we are creating the MD nodes, we can provide a simple
identifier that is unique within the DIBuider.
             In DIBuilder, implement generateTypeIdentifier for types that
are globally visible (one possibility is the type name appended with a
unique ID)
Step 2: In DIBuilder::finalize(), we call hashing algorithm to update the
type identifiers generated in step 1 to be the type name appended with its
hash
Step 1 is necessary because we want to make sure we are using the type
identifiers when referring to the types.
Without step 1, the type reference will be via MDNode, and a MDNode field
can't be updated to a type identifier later on.
Step 3: We can then use the folding set to do the uniquing across CUs,
during linking.

Let me know your thoughts,

Thanks,
Manman


> -eric
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131021/01a45d73/attachment.html>


More information about the llvm-dev mailing list