[cfe-dev] RFC: ODR checker for Clang and LLD

Tue Jun 13 19:54:52 PDT 2017

On Tue, Jun 13, 2017 at 6:34 PM Peter Collingbourne via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> On Wed, Jun 7, 2017 at 11:28 PM, Peter Collingbourne <peter at pcc.me.uk>
> wrote:
>
>> On Wed, Jun 7, 2017 at 8:06 PM, Sean Silva <chisophugis at gmail.com> wrote:
>>
>>>
>>>
>>> On Wed, Jun 7, 2017 at 4:31 PM, Peter Collingbourne <peter at pcc.me.uk>
>>> wrote:
>>>
>>>> On Wed, Jun 7, 2017 at 12:17 AM, Sean Silva <chisophugis at gmail.com>
>>>> wrote:
>>>>
>>>>> Very nice and simple implementation!
>>>>>
>>>>> Do you have any statistics on how large these odr tables are compared
>>>>> to other object file data? I assume that if these tables contain full
>>>>> mangled symbol names, they could end up being very large and may want to
>>>>> share the symbol name strings with the overall string table in the .o
>>>>>
>>>>
>>>> Looking at Chromium's object files it looks like the total size of the
>>>> odrtabs is about 50% of the total size of the object files, which isn't
>>>> great. The current implementation only looks at records, so I imagine that
>>>> it would be hard to share any of the strings that I'm currently creating.
>>>> (I guess it's possible that some types will have a mangled vtable name in
>>>> the string table, so we may be able to share a little that way.) Note
>>>> however that this was without debug info.
>>>>
>>>> One option for reducing size would be to
>>>> 1) store hashes of ODR names in ODR tables, per Rui's suggestion
>>>> (alongside a reference to the name itself in the string table)
>>>> 2) compress the string table for the ODR names with a standard
>>>> compression algorithm like gzip.
>>>> This wouldn't seem to affect link time performance much because I think
>>>> we should only need to look at the strings if we see a ODR name hash match
>>>> together with an ODR hash mismatch, which would mean an ODR violation with
>>>> a high probability (i.e. unless there was an ODR name hash collision, we
>>>> have found an ODR violation). If we don't expect a lot of sharing with
>>>> regular string tables (see below), it seems even more reasonable.
>>>>
>>>
>>> Neat observation!
>>>
>>> FWIW, it is a birthday problem type situation though, so for a 32-bit
>>> hash, we would expect a collision in about 1 in 2^16 distinct hashes (and
>>> 2^16 seems pretty easy to hit in a large project). So 64-bit hashes might
>>> be preferable.
>>>
>>
>> Oh right, good point, using a 64-bit hash does seem like a good idea here.
>>
>>
>>>
>>>> Also, do you have any numbers on the performance of your initial
>>>>> implementation?
>>>>>
>>>>
>>>> I measured the link time for chromium's unit_tests (the largest single
>>>> binary in chromium) at 5.05s without ODR checks and 6.61s with ODR checks.
>>>> So about 30% overhead, but in absolute terms it doesn't seem too bad. So I
>>>> think this may be acceptable for an initial implementation, but it
>>>> certainly seems worth trying to do better.
>>>>
>>>
>>> I know that things aren't currently apples-to-apples, but how does that
>>> compare to gold?
>>>
>>
>> I will measure it.
>>
>
> For that unit_tests binary I measured the overhead at about 5 seconds
> (average of 10 runs). That is with debug info, of course.
>
> W.r.t. LLD and having it always on by default (and hence making it as fast
>>>>> as possible), it seems like right now you are implementing the checking
>>>>> process with a hash table. That's simple and fine for a first
>>>>> implementation, but it's probably worth mentioning in a comment the problem
>>>>> of checking the tables, at least from the linker's perspective, does fit
>>>>> into a map-reduce pattern and could be easily parallelized if needed. E.g.
>>>>> a parallel sort to coalesce all entries for symbols of the same name
>>>>> followed by a parallel forEach to check each bucket with the same symbol
>>>>> name (roughly speaking).
>>>>>
>>>>
>>>> Right, that's one approach. I was thinking of a simpler approach where
>>>> at compile time we sort ODR names by hash and partition them using (say)
>>>> the upper bits of the hash, so that at link time we can have N threads each
>>>> building a hash table for a specific partition.
>>>>
>>>> And of course this work can be started right after symbol resolution
>>>> finishes and parallelised with the rest of the work done by the linker.
>>>>
>>>> Even better than doing it faster is just doing less work. There's a lot
>>>>> of work that the linker is already doing that may be reusable for the ODR
>>>>> checking.
>>>>> E.g.
>>>>> - maybe we could get the coalescing step as a byproduct of our
>>>>> existing string deduping, which we are generally doing anyway.
>>>>> - we are already coalescing symbol names for the symbol table. If the
>>>>> ODR table is keyed off of symbols in the binary that we are inserting into
>>>>> the symbol table, then I think we could do the entire ODR check with no
>>>>> extra "string" work on LLD's part.
>>>>>
>>>>> I see Rui already mentioned some of this in
>>>>> https://bugs.chromium.org/p/chromium/issues/detail?id=726071#c4.
>>>>> You mentioned that not everything is necessarily directly keyed on a
>>>>> symbol (such as types), but I think that it would really simplify things if
>>>>> the check was done as such. Do you have any idea exactly how much of the
>>>>> things that we want to check are not keyed on symbols? If most things are
>>>>> keyed on symbols, for the things we are not we can just emit extra symbols
>>>>> prefixed by __clang_odr_check_ or whatever.
>>>>>
>>>>
>>>> Since the current implementation only works with records there is
>>>> basically zero overlap right now between ODR names and symbols. I suppose
>>>> that I could estimate the amount of function overlap in a hypothetical
>>>> implementation that computes ODR hashes of functions by comparing the
>>>> number of *_odr functions after clang has finished IRgen with the number
>>>> after optimization finishes. This of course would be strictly more than
>>>> functions + types.
>>>>
>>>
>>> Wouldn't any function or symbol using the record type have the type name
>>> somewhere in it? If we used an offset+length encoding (instead of offset +
>>> NUL termination) we might be able to reuse it then (at some cost in finding
>>> the reference).
>>>
>>
>> That may be possible with some work in the string table builder. But at
>> that point of course we're not dealing with regular symbols any more. I
>> guess we could have two ODR tables per object file: an array of (ODR hash,
>> location) tuples for ODR names that correspond to symbol table symbols
>> (i.e. Rui's proposal on the chromium bug), and an array of (ODR name, ODR
>> hash, location) tuples for all other ODR names. I guess if we wanted a "low
>> overhead" mode we could just omit the second table or put fewer symbols in
>> it.
>>
>> With debug info surely there is some sort of string representing the
>>> record name or something like that, no?
>>>
>>
>> Not the record name on its own (they do appear but a bit awkwardly --
>> each namespace component is stored in a separate string), but if the record
>> has at least one member function the mangled type name will appear
>> somewhere in .debug_str, so we could in principle reuse that with the
>> offset/length trick.
>>
>> I guess we may have to have our "low-overhead" user-facing behavior be a
>>> bit more nuanced. E.g.:
>>> 1. does this feature bloat object files significantly
>>> 2. does this feature slow down link times significantly
>>>
>>> Intuitively, it seems like we should be able to get 1. when debug info
>>> happens to be enabled (not sure about split dwarf?) and possibly in all
>>> cases at the cost of complexity. We may be able to get 2. in all cases with
>>> proper design.
>>>
>>
>> I think that would be my rough assessment as well. I think we have a good
>> shot at 1 for all cases with some of the ideas that have been mentioned
>> already. If we can avoid creating dependencies on DWARF I think that would
>> be ideal -- I'd ideally like this to work for COFF as well, where you'd
>> typically expect to find CodeView in object files. If I were to try this I
>> think the first thing that I would try is hash/compression combined with
>> the two ODR tables (no reuse for non-symbol ODR names to start with, as
>> compression may be enough on its own).
>>
>
> I developed a second prototype which uses hash/compression with no attempt
> to reuse. It is available here:
> https://github.com/pcc/llvm-project/tree/odr-checker2
>
> For Chromium the object file size overhead was 536566007 bytes, or in
> relative terms about 25%, or about 4% with debug info. I measured perf
> overhead for unit_tests at about 6%, but after I moved the checker onto
> another thread, the overhead disappeared into the noise.
>

Still seems like quite a big increase.

Any chance of compression? Could this go into .dwo files with Fission and
be checked by dwp instead, perhaps?

> I'm reasonably happy with these figures, at least for a first
> implementation. We may be able to do even better for file size with reuse,
> but I'd leave that for version 2.
>

What's the story with compatibility between versions, then? Is there a
version header? Will old formats be supported by lld indefinitely? Not at
all?

- Dave

>
> Peter
>
>>
>> Peter
>>
>>
>>>
>>> -- Sean Silva
>>>
>>>
>>>>
>>>>
>>>>> The issue of retaining the ODR check for functions even if they get
>>>>> inlined may inherently pose an extra cost that can't be folded into
>>>>> existing work the linker is doing, so there might be a reason for clang to
>>>>> have a default mode that has practically no linking overhead and one that
>>>>> does more thorough checking but imposes extra linking overhead. Think
>>>>> something like a crazy boost library with thousands of functions that get
>>>>> inlined away, but have gigantic mangled names and so precisely are the ones
>>>>> that are going to impose extra cost on the linker. Simply due to the extra
>>>>> volume of strings that the linker would need to look at, I don't think
>>>>> there's a way to include checking of all inlined function "for free" at the
>>>>> linker level using the symbol approach.
>>>>>
>>>>
>>>>>
>>>> I guess those inlined functions would still have those symbol names in
>>>>> debug info (I think?), so piggybacking on the string deduplication we're
>>>>> already doing might make it possible to fold away the work in that case
>>>>> (but then again, would still impose extra cost with split dwarf...).
>>>>>
>>>>> Anyway, let's wait to see what the actual performance numbers are.
>>>>>
>>>>> -- Sean Silva
>>>>>
>>>>> On Tue, Jun 6, 2017 at 10:40 PM, Peter Collingbourne via cfe-dev <
>>>>> cfe-dev at lists.llvm.org> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I'd like to propose an ODR checker feature for Clang and LLD. The
>>>>>> feature would be similar to gold's --detect-odr-violations feature, but
>>>>>> better: we can rely on integration with clang to avoid relying on debug
>>>>>> info and to perform more precise matching.
>>>>>>
>>>>>> The basic idea is that we use clang's ability to create ODR hashes
>>>>>> for declarations. ODR hashes are computed using all information about a
>>>>>> declaration that is ODR-relevant. If the flag -fdetect-odr-violations is
>>>>>> passed, Clang will store the ODR hashes in a so-called ODR table in each
>>>>>> object file. Each ODR table will contain a mapping from mangled declaration
>>>>>> names to ODR hashes. At link time, the linker will read the ODR table and
>>>>>> report any mismatches.
>>>>>>
>>>>>> To make this work:
>>>>>> - LLVM will be extended with the ability to represent ODR tables in
>>>>>> the IR and emit them to object files
>>>>>> - Clang will be extended with the ability to emit ODR tables using
>>>>>> ODR hashes
>>>>>> - LLD will be extended to read ODR tables from object files
>>>>>>
>>>>>> I have implemented a prototype of this feature. It is available here:
>>>>>> https://github.com/pcc/llvm-project/tree/odr-checker and some
>>>>>> results from applying it to chromium are here: crbug.com/726071
>>>>>> As you can see it did indeed find a number of real ODR violations in
>>>>>> Chromium, including some that wouldn't be detectable using debug info.
>>>>>>
>>>>>> If you're interested in what the format of the ODR table would look
>>>>>> like, that prototype shows pretty much what I had in mind, but I expect
>>>>>> many other aspects of the implementation to change as it is upstreamed.
>>>>>>
>>>>>> Thanks,
>>>>>> --
>>>>>> --
>>>>>> Peter
>>>>>>
>>>>>> _______________________________________________
>>>>>> cfe-dev mailing list
>>>>>> cfe-dev at lists.llvm.org
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> --
>>>> Peter
>>>>
>>>
>>>
>>
>>
>> --
>> --
>> Peter
>>
>
>
>
> --
> --
> Peter
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20170614/620a5e5f/attachment.html>