[cfe-dev] RFC: ODR checker for Clang and LLD

Tue Jun 13 20:48:17 PDT 2017

On Tue, Jun 13, 2017 at 8:43 PM Peter Collingbourne <peter at pcc.me.uk> wrote:

> On Tue, Jun 13, 2017 at 7:54 PM, David Blaikie <dblaikie at gmail.com> wrote:
>
>>
>>
>> On Tue, Jun 13, 2017 at 6:34 PM Peter Collingbourne via cfe-dev <
>> cfe-dev at lists.llvm.org> wrote:
>>
>>> On Wed, Jun 7, 2017 at 11:28 PM, Peter Collingbourne <peter at pcc.me.uk>
>>> wrote:
>>>
>>>> On Wed, Jun 7, 2017 at 8:06 PM, Sean Silva <chisophugis at gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Wed, Jun 7, 2017 at 4:31 PM, Peter Collingbourne <peter at pcc.me.uk>
>>>>> wrote:
>>>>>
>>>>>> On Wed, Jun 7, 2017 at 12:17 AM, Sean Silva <chisophugis at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Very nice and simple implementation!
>>>>>>>
>>>>>>> Do you have any statistics on how large these odr tables are
>>>>>>> compared to other object file data? I assume that if these tables contain
>>>>>>> full mangled symbol names, they could end up being very large and may want
>>>>>>> to share the symbol name strings with the overall string table in the .o
>>>>>>>
>>>>>>
>>>>>> Looking at Chromium's object files it looks like the total size of
>>>>>> the odrtabs is about 50% of the total size of the object files, which isn't
>>>>>> great. The current implementation only looks at records, so I imagine that
>>>>>> it would be hard to share any of the strings that I'm currently creating.
>>>>>> (I guess it's possible that some types will have a mangled vtable name in
>>>>>> the string table, so we may be able to share a little that way.) Note
>>>>>> however that this was without debug info.
>>>>>>
>>>>>> One option for reducing size would be to
>>>>>> 1) store hashes of ODR names in ODR tables, per Rui's suggestion
>>>>>> (alongside a reference to the name itself in the string table)
>>>>>> 2) compress the string table for the ODR names with a standard
>>>>>> compression algorithm like gzip.
>>>>>> This wouldn't seem to affect link time performance much because I
>>>>>> think we should only need to look at the strings if we see a ODR name hash
>>>>>> match together with an ODR hash mismatch, which would mean an ODR violation
>>>>>> with a high probability (i.e. unless there was an ODR name hash collision,
>>>>>> we have found an ODR violation). If we don't expect a lot of sharing with
>>>>>> regular string tables (see below), it seems even more reasonable.
>>>>>>
>>>>>
>>>>> Neat observation!
>>>>>
>>>>> FWIW, it is a birthday problem type situation though, so for a 32-bit
>>>>> hash, we would expect a collision in about 1 in 2^16 distinct hashes (and
>>>>> 2^16 seems pretty easy to hit in a large project). So 64-bit hashes might
>>>>> be preferable.
>>>>>
>>>>
>>>> Oh right, good point, using a 64-bit hash does seem like a good idea
>>>> here.
>>>>
>>>>
>>>>>
>>>>>> Also, do you have any numbers on the performance of your initial
>>>>>>> implementation?
>>>>>>>
>>>>>>
>>>>>> I measured the link time for chromium's unit_tests (the largest
>>>>>> single binary in chromium) at 5.05s without ODR checks and 6.61s with ODR
>>>>>> checks. So about 30% overhead, but in absolute terms it doesn't seem too
>>>>>> bad. So I think this may be acceptable for an initial implementation, but
>>>>>> it certainly seems worth trying to do better.
>>>>>>
>>>>>
>>>>> I know that things aren't currently apples-to-apples, but how does
>>>>> that compare to gold?
>>>>>
>>>>
>>>> I will measure it.
>>>>
>>>
>>> For that unit_tests binary I measured the overhead at about 5 seconds
>>> (average of 10 runs). That is with debug info, of course.
>>>
>>> W.r.t. LLD and having it always on by default (and hence making it as
>>>>>>> fast as possible), it seems like right now you are implementing the
>>>>>>> checking process with a hash table. That's simple and fine for a first
>>>>>>> implementation, but it's probably worth mentioning in a comment the problem
>>>>>>> of checking the tables, at least from the linker's perspective, does fit
>>>>>>> into a map-reduce pattern and could be easily parallelized if needed. E.g.
>>>>>>> a parallel sort to coalesce all entries for symbols of the same name
>>>>>>> followed by a parallel forEach to check each bucket with the same symbol
>>>>>>> name (roughly speaking).
>>>>>>>
>>>>>>
>>>>>> Right, that's one approach. I was thinking of a simpler approach
>>>>>> where at compile time we sort ODR names by hash and partition them using
>>>>>> (say) the upper bits of the hash, so that at link time we can have N
>>>>>> threads each building a hash table for a specific partition.
>>>>>>
>>>>>> And of course this work can be started right after symbol resolution
>>>>>> finishes and parallelised with the rest of the work done by the linker.
>>>>>>
>>>>>> Even better than doing it faster is just doing less work. There's a
>>>>>>> lot of work that the linker is already doing that may be reusable for the
>>>>>>> ODR checking.
>>>>>>> E.g.
>>>>>>> - maybe we could get the coalescing step as a byproduct of our
>>>>>>> existing string deduping, which we are generally doing anyway.
>>>>>>> - we are already coalescing symbol names for the symbol table. If
>>>>>>> the ODR table is keyed off of symbols in the binary that we are inserting
>>>>>>> into the symbol table, then I think we could do the entire ODR check with
>>>>>>> no extra "string" work on LLD's part.
>>>>>>>
>>>>>>> I see Rui already mentioned some of this in
>>>>>>> https://bugs.chromium.org/p/chromium/issues/detail?id=726071#c4.
>>>>>>> You mentioned that not everything is necessarily directly keyed on a
>>>>>>> symbol (such as types), but I think that it would really simplify things if
>>>>>>> the check was done as such. Do you have any idea exactly how much of the
>>>>>>> things that we want to check are not keyed on symbols? If most things are
>>>>>>> keyed on symbols, for the things we are not we can just emit extra symbols
>>>>>>> prefixed by __clang_odr_check_ or whatever.
>>>>>>>
>>>>>>
>>>>>> Since the current implementation only works with records there is
>>>>>> basically zero overlap right now between ODR names and symbols. I suppose
>>>>>> that I could estimate the amount of function overlap in a hypothetical
>>>>>> implementation that computes ODR hashes of functions by comparing the
>>>>>> number of *_odr functions after clang has finished IRgen with the number
>>>>>> after optimization finishes. This of course would be strictly more than
>>>>>> functions + types.
>>>>>>
>>>>>
>>>>> Wouldn't any function or symbol using the record type have the type
>>>>> name somewhere in it? If we used an offset+length encoding (instead of
>>>>> offset + NUL termination) we might be able to reuse it then (at some cost
>>>>> in finding the reference).
>>>>>
>>>>
>>>> That may be possible with some work in the string table builder. But at
>>>> that point of course we're not dealing with regular symbols any more. I
>>>> guess we could have two ODR tables per object file: an array of (ODR hash,
>>>> location) tuples for ODR names that correspond to symbol table symbols
>>>> (i.e. Rui's proposal on the chromium bug), and an array of (ODR name, ODR
>>>> hash, location) tuples for all other ODR names. I guess if we wanted a "low
>>>> overhead" mode we could just omit the second table or put fewer symbols in
>>>> it.
>>>>
>>>> With debug info surely there is some sort of string representing the
>>>>> record name or something like that, no?
>>>>>
>>>>
>>>> Not the record name on its own (they do appear but a bit awkwardly --
>>>> each namespace component is stored in a separate string), but if the record
>>>> has at least one member function the mangled type name will appear
>>>> somewhere in .debug_str, so we could in principle reuse that with the
>>>> offset/length trick.
>>>>
>>>> I guess we may have to have our "low-overhead" user-facing behavior be
>>>>> a bit more nuanced. E.g.:
>>>>> 1. does this feature bloat object files significantly
>>>>> 2. does this feature slow down link times significantly
>>>>>
>>>>> Intuitively, it seems like we should be able to get 1. when debug info
>>>>> happens to be enabled (not sure about split dwarf?) and possibly in all
>>>>> cases at the cost of complexity. We may be able to get 2. in all cases with
>>>>> proper design.
>>>>>
>>>>
>>>> I think that would be my rough assessment as well. I think we have a
>>>> good shot at 1 for all cases with some of the ideas that have been
>>>> mentioned already. If we can avoid creating dependencies on DWARF I think
>>>> that would be ideal -- I'd ideally like this to work for COFF as well,
>>>> where you'd typically expect to find CodeView in object files. If I were to
>>>> try this I think the first thing that I would try is hash/compression
>>>> combined with the two ODR tables (no reuse for non-symbol ODR names to
>>>> start with, as compression may be enough on its own).
>>>>
>>>
>>> I developed a second prototype which uses hash/compression with no
>>> attempt to reuse. It is available here:
>>> https://github.com/pcc/llvm-project/tree/odr-checker2
>>>
>>> For Chromium the object file size overhead was 536566007 bytes, or in
>>> relative terms about 25%, or about 4% with debug info. I measured perf
>>> overhead for unit_tests at about 6%, but after I moved the checker onto
>>> another thread, the overhead disappeared into the noise.
>>>
>>
>> Still seems like quite a big increase.
>>
>> Any chance of compression?
>>
>
> That was with compression -- the implementation compresses the parts of
> the ODR table that aren't hashes (aside from the header and the Clang
> version, which is a small fixed cost), as well as the string table. The
> hashes were left uncompressed because they are in the critical path of the
> linker and because I imagine that they wouldn't really be that compressible.
>

I'd be a bit surprised if they weren't especially compressible - and how
much of the size increase is the compressed data V the uncompressed data?

Is it still in the hot path when parallelized?

> So I think the remaining gains would either be through limiting the number
> of ODR table entries, or through reuse of data.
>
> Limiting might be something to explore -- one possibility is that we could
> limit the ODR table entries to the declarations that are "used" by a
> particular translation unit (it appears that Clang tracks something like
> that in Decl::Used/Decl::Referenced, but I'm not sure if that is exactly
> what we need -- I think we would basically need to test for reference
> reachability from the functions/globals that are IRgen'd).
>

Currently it has every type and function that is in the AST? Yeah, that's a
lot - perhaps it should be more like the things that go in the DWARF?
(though would need to add some cases there - since the DWARF logic already
relies on the ODR to not emit duplicates in some cases)

> In terms of reuse, it seems that of the 536566007 bytes of
> overhead, 319309579 were the compressed part of the ODR tables. So even if
> we achieved 100% sharing,
>

100% sharing? You mean if all the data were compressed, and assuming the
hashes were compressible at the same ratio as the other data?

> with the current scheme I think that our minimum achievable overhead would
> be ~15% (no debug info) or ~2% (with debug info).
>
>
>> Could this go into .dwo files with Fission and be checked by dwp instead,
>> perhaps?
>>
>
> I think it could also work that way, yes.
>
> I'm reasonably happy with these figures, at least for a first
>>> implementation. We may be able to do even better for file size with reuse,
>>> but I'd leave that for version 2.
>>>
>>
>> What's the story with compatibility between versions, then? Is there a
>> version header?
>>
>
> Yes, the header contains a version number.
>
>
>> Will old formats be supported by lld indefinitely? Not at all?
>>
>
> I think we should drop support for old formats when we introduce a new
> format. My understanding is that the ODR hash can change whenever Clang
> changes (the implementation only performs ODR checking if all ODR tables
> were produced by the same revision of Clang), so there wouldn't seem to be
> a huge benefit in keeping support for old formats around.
>

I imagine it's possible people aren't necessarily going to rev lld in exact
lock-step with clang, but I could be wrong. (certainly binutils ld or gold
aren't released/kept in lock-step with GCC, for example)

>
> Peter
>
>>
>> - Dave
>>
>>
>>>
>>> Peter
>>>
>>>>
>>>> Peter
>>>>
>>>>
>>>>>
>>>>> -- Sean Silva
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>> The issue of retaining the ODR check for functions even if they get
>>>>>>> inlined may inherently pose an extra cost that can't be folded into
>>>>>>> existing work the linker is doing, so there might be a reason for clang to
>>>>>>> have a default mode that has practically no linking overhead and one that
>>>>>>> does more thorough checking but imposes extra linking overhead. Think
>>>>>>> something like a crazy boost library with thousands of functions that get
>>>>>>> inlined away, but have gigantic mangled names and so precisely are the ones
>>>>>>> that are going to impose extra cost on the linker. Simply due to the extra
>>>>>>> volume of strings that the linker would need to look at, I don't think
>>>>>>> there's a way to include checking of all inlined function "for free" at the
>>>>>>> linker level using the symbol approach.
>>>>>>>
>>>>>>
>>>>>>>
>>>>>> I guess those inlined functions would still have those symbol names
>>>>>>> in debug info (I think?), so piggybacking on the string deduplication we're
>>>>>>> already doing might make it possible to fold away the work in that case
>>>>>>> (but then again, would still impose extra cost with split dwarf...).
>>>>>>>
>>>>>>> Anyway, let's wait to see what the actual performance numbers are.
>>>>>>>
>>>>>>> -- Sean Silva
>>>>>>>
>>>>>>> On Tue, Jun 6, 2017 at 10:40 PM, Peter Collingbourne via cfe-dev <
>>>>>>> cfe-dev at lists.llvm.org> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I'd like to propose an ODR checker feature for Clang and LLD. The
>>>>>>>> feature would be similar to gold's --detect-odr-violations feature, but
>>>>>>>> better: we can rely on integration with clang to avoid relying on debug
>>>>>>>> info and to perform more precise matching.
>>>>>>>>
>>>>>>>> The basic idea is that we use clang's ability to create ODR hashes
>>>>>>>> for declarations. ODR hashes are computed using all information about a
>>>>>>>> declaration that is ODR-relevant. If the flag -fdetect-odr-violations is
>>>>>>>> passed, Clang will store the ODR hashes in a so-called ODR table in each
>>>>>>>> object file. Each ODR table will contain a mapping from mangled declaration
>>>>>>>> names to ODR hashes. At link time, the linker will read the ODR table and
>>>>>>>> report any mismatches.
>>>>>>>>
>>>>>>>> To make this work:
>>>>>>>> - LLVM will be extended with the ability to represent ODR tables in
>>>>>>>> the IR and emit them to object files
>>>>>>>> - Clang will be extended with the ability to emit ODR tables using
>>>>>>>> ODR hashes
>>>>>>>> - LLD will be extended to read ODR tables from object files
>>>>>>>>
>>>>>>>> I have implemented a prototype of this feature. It is available
>>>>>>>> here: https://github.com/pcc/llvm-project/tree/odr-checker and
>>>>>>>> some results from applying it to chromium are here:
>>>>>>>> crbug.com/726071
>>>>>>>> As you can see it did indeed find a number of real ODR violations
>>>>>>>> in Chromium, including some that wouldn't be detectable using debug info.
>>>>>>>>
>>>>>>>> If you're interested in what the format of the ODR table would look
>>>>>>>> like, that prototype shows pretty much what I had in mind, but I expect
>>>>>>>> many other aspects of the implementation to change as it is upstreamed.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> --
>>>>>>>> --
>>>>>>>> Peter
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> cfe-dev mailing list
>>>>>>>> cfe-dev at lists.llvm.org
>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> --
>>>>>> Peter
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> --
>>>> Peter
>>>>
>>>
>>>
>>>
>>> --
>>> --
>>> Peter
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>
>>
>
>
> --
> --
> Peter
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20170614/3acd11ef/attachment.html>