[llvm-dev] [cfe-dev] RFC: ODR checker for Clang and LLD

Tue Jun 13 18:34:41 PDT 2017

On Wed, Jun 7, 2017 at 11:28 PM, Peter Collingbourne <peter at pcc.me.uk>
wrote:

> On Wed, Jun 7, 2017 at 8:06 PM, Sean Silva <chisophugis at gmail.com> wrote:
>
>>
>>
>> On Wed, Jun 7, 2017 at 4:31 PM, Peter Collingbourne <peter at pcc.me.uk>
>> wrote:
>>
>>> On Wed, Jun 7, 2017 at 12:17 AM, Sean Silva <chisophugis at gmail.com>
>>> wrote:
>>>
>>>> Very nice and simple implementation!
>>>>
>>>> Do you have any statistics on how large these odr tables are compared
>>>> to other object file data? I assume that if these tables contain full
>>>> mangled symbol names, they could end up being very large and may want to
>>>> share the symbol name strings with the overall string table in the .o
>>>>
>>>
>>> Looking at Chromium's object files it looks like the total size of the
>>> odrtabs is about 50% of the total size of the object files, which isn't
>>> great. The current implementation only looks at records, so I imagine that
>>> it would be hard to share any of the strings that I'm currently creating.
>>> (I guess it's possible that some types will have a mangled vtable name in
>>> the string table, so we may be able to share a little that way.) Note
>>> however that this was without debug info.
>>>
>>> One option for reducing size would be to
>>> 1) store hashes of ODR names in ODR tables, per Rui's suggestion
>>> (alongside a reference to the name itself in the string table)
>>> 2) compress the string table for the ODR names with a standard
>>> compression algorithm like gzip.
>>> This wouldn't seem to affect link time performance much because I think
>>> we should only need to look at the strings if we see a ODR name hash match
>>> together with an ODR hash mismatch, which would mean an ODR violation with
>>> a high probability (i.e. unless there was an ODR name hash collision, we
>>> have found an ODR violation). If we don't expect a lot of sharing with
>>> regular string tables (see below), it seems even more reasonable.
>>>
>>
>> Neat observation!
>>
>> FWIW, it is a birthday problem type situation though, so for a 32-bit
>> hash, we would expect a collision in about 1 in 2^16 distinct hashes (and
>> 2^16 seems pretty easy to hit in a large project). So 64-bit hashes might
>> be preferable.
>>
>
> Oh right, good point, using a 64-bit hash does seem like a good idea here.
>
>
>>
>>> Also, do you have any numbers on the performance of your initial
>>>> implementation?
>>>>
>>>
>>> I measured the link time for chromium's unit_tests (the largest single
>>> binary in chromium) at 5.05s without ODR checks and 6.61s with ODR checks.
>>> So about 30% overhead, but in absolute terms it doesn't seem too bad. So I
>>> think this may be acceptable for an initial implementation, but it
>>> certainly seems worth trying to do better.
>>>
>>
>> I know that things aren't currently apples-to-apples, but how does that
>> compare to gold?
>>
>
> I will measure it.
>

For that unit_tests binary I measured the overhead at about 5 seconds
(average of 10 runs). That is with debug info, of course.

W.r.t. LLD and having it always on by default (and hence making it as fast
>>>> as possible), it seems like right now you are implementing the checking
>>>> process with a hash table. That's simple and fine for a first
>>>> implementation, but it's probably worth mentioning in a comment the problem
>>>> of checking the tables, at least from the linker's perspective, does fit
>>>> into a map-reduce pattern and could be easily parallelized if needed. E.g.
>>>> a parallel sort to coalesce all entries for symbols of the same name
>>>> followed by a parallel forEach to check each bucket with the same symbol
>>>> name (roughly speaking).
>>>>
>>>
>>> Right, that's one approach. I was thinking of a simpler approach where
>>> at compile time we sort ODR names by hash and partition them using (say)
>>> the upper bits of the hash, so that at link time we can have N threads each
>>> building a hash table for a specific partition.
>>>
>>> And of course this work can be started right after symbol resolution
>>> finishes and parallelised with the rest of the work done by the linker.
>>>
>>> Even better than doing it faster is just doing less work. There's a lot
>>>> of work that the linker is already doing that may be reusable for the ODR
>>>> checking.
>>>> E.g.
>>>> - maybe we could get the coalescing step as a byproduct of our existing
>>>> string deduping, which we are generally doing anyway.
>>>> - we are already coalescing symbol names for the symbol table. If the
>>>> ODR table is keyed off of symbols in the binary that we are inserting into
>>>> the symbol table, then I think we could do the entire ODR check with no
>>>> extra "string" work on LLD's part.
>>>>
>>>> I see Rui already mentioned some of this in https://bugs.chromium.org/p
>>>> /chromium/issues/detail?id=726071#c4.
>>>> You mentioned that not everything is necessarily directly keyed on a
>>>> symbol (such as types), but I think that it would really simplify things if
>>>> the check was done as such. Do you have any idea exactly how much of the
>>>> things that we want to check are not keyed on symbols? If most things are
>>>> keyed on symbols, for the things we are not we can just emit extra symbols
>>>> prefixed by __clang_odr_check_ or whatever.
>>>>
>>>
>>> Since the current implementation only works with records there is
>>> basically zero overlap right now between ODR names and symbols. I suppose
>>> that I could estimate the amount of function overlap in a hypothetical
>>> implementation that computes ODR hashes of functions by comparing the
>>> number of *_odr functions after clang has finished IRgen with the number
>>> after optimization finishes. This of course would be strictly more than
>>> functions + types.
>>>
>>
>> Wouldn't any function or symbol using the record type have the type name
>> somewhere in it? If we used an offset+length encoding (instead of offset +
>> NUL termination) we might be able to reuse it then (at some cost in finding
>> the reference).
>>
>
> That may be possible with some work in the string table builder. But at
> that point of course we're not dealing with regular symbols any more. I
> guess we could have two ODR tables per object file: an array of (ODR hash,
> location) tuples for ODR names that correspond to symbol table symbols
> (i.e. Rui's proposal on the chromium bug), and an array of (ODR name, ODR
> hash, location) tuples for all other ODR names. I guess if we wanted a "low
> overhead" mode we could just omit the second table or put fewer symbols in
> it.
>
> With debug info surely there is some sort of string representing the
>> record name or something like that, no?
>>
>
> Not the record name on its own (they do appear but a bit awkwardly -- each
> namespace component is stored in a separate string), but if the record has
> at least one member function the mangled type name will appear somewhere in
> .debug_str, so we could in principle reuse that with the offset/length
> trick.
>
> I guess we may have to have our "low-overhead" user-facing behavior be a
>> bit more nuanced. E.g.:
>> 1. does this feature bloat object files significantly
>> 2. does this feature slow down link times significantly
>>
>> Intuitively, it seems like we should be able to get 1. when debug info
>> happens to be enabled (not sure about split dwarf?) and possibly in all
>> cases at the cost of complexity. We may be able to get 2. in all cases with
>> proper design.
>>
>
> I think that would be my rough assessment as well. I think we have a good
> shot at 1 for all cases with some of the ideas that have been mentioned
> already. If we can avoid creating dependencies on DWARF I think that would
> be ideal -- I'd ideally like this to work for COFF as well, where you'd
> typically expect to find CodeView in object files. If I were to try this I
> think the first thing that I would try is hash/compression combined with
> the two ODR tables (no reuse for non-symbol ODR names to start with, as
> compression may be enough on its own).
>

I developed a second prototype which uses hash/compression with no attempt
to reuse. It is available here: https://github.com/pcc/
llvm-project/tree/odr-checker2

For Chromium the object file size overhead was 536566007 bytes, or in
relative terms about 25%, or about 4% with debug info. I measured perf
overhead for unit_tests at about 6%, but after I moved the checker onto
another thread, the overhead disappeared into the noise.

I'm reasonably happy with these figures, at least for a first
implementation. We may be able to do even better for file size with reuse,
but I'd leave that for version 2.

Peter

>
> Peter
>
>
>>
>> -- Sean Silva
>>
>>
>>>
>>>
>>>> The issue of retaining the ODR check for functions even if they get
>>>> inlined may inherently pose an extra cost that can't be folded into
>>>> existing work the linker is doing, so there might be a reason for clang to
>>>> have a default mode that has practically no linking overhead and one that
>>>> does more thorough checking but imposes extra linking overhead. Think
>>>> something like a crazy boost library with thousands of functions that get
>>>> inlined away, but have gigantic mangled names and so precisely are the ones
>>>> that are going to impose extra cost on the linker. Simply due to the extra
>>>> volume of strings that the linker would need to look at, I don't think
>>>> there's a way to include checking of all inlined function "for free" at the
>>>> linker level using the symbol approach.
>>>>
>>>
>>>>
>>> I guess those inlined functions would still have those symbol names in
>>>> debug info (I think?), so piggybacking on the string deduplication we're
>>>> already doing might make it possible to fold away the work in that case
>>>> (but then again, would still impose extra cost with split dwarf...).
>>>>
>>>> Anyway, let's wait to see what the actual performance numbers are.
>>>>
>>>> -- Sean Silva
>>>>
>>>> On Tue, Jun 6, 2017 at 10:40 PM, Peter Collingbourne via cfe-dev <
>>>> cfe-dev at lists.llvm.org> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I'd like to propose an ODR checker feature for Clang and LLD. The
>>>>> feature would be similar to gold's --detect-odr-violations feature, but
>>>>> better: we can rely on integration with clang to avoid relying on debug
>>>>> info and to perform more precise matching.
>>>>>
>>>>> The basic idea is that we use clang's ability to create ODR hashes for
>>>>> declarations. ODR hashes are computed using all information about a
>>>>> declaration that is ODR-relevant. If the flag -fdetect-odr-violations is
>>>>> passed, Clang will store the ODR hashes in a so-called ODR table in each
>>>>> object file. Each ODR table will contain a mapping from mangled declaration
>>>>> names to ODR hashes. At link time, the linker will read the ODR table and
>>>>> report any mismatches.
>>>>>
>>>>> To make this work:
>>>>> - LLVM will be extended with the ability to represent ODR tables in
>>>>> the IR and emit them to object files
>>>>> - Clang will be extended with the ability to emit ODR tables using ODR
>>>>> hashes
>>>>> - LLD will be extended to read ODR tables from object files
>>>>>
>>>>> I have implemented a prototype of this feature. It is available here:
>>>>> https://github.com/pcc/llvm-project/tree/odr-checker and some results
>>>>> from applying it to chromium are here: crbug.com/726071
>>>>> As you can see it did indeed find a number of real ODR violations in
>>>>> Chromium, including some that wouldn't be detectable using debug info.
>>>>>
>>>>> If you're interested in what the format of the ODR table would look
>>>>> like, that prototype shows pretty much what I had in mind, but I expect
>>>>> many other aspects of the implementation to change as it is upstreamed.
>>>>>
>>>>> Thanks,
>>>>> --
>>>>> --
>>>>> Peter
>>>>>
>>>>> _______________________________________________
>>>>> cfe-dev mailing list
>>>>> cfe-dev at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> --
>>> Peter
>>>
>>
>>
>
>
> --
> --
> Peter
>

-- 
-- 
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170613/c4f85ce4/attachment-0001.html>