[cfe-dev] RFC: ODR checker for Clang and LLD

Tue Jun 13 23:30:08 PDT 2017

On Tue, Jun 13, 2017 at 11:06 PM, David Blaikie <dblaikie at gmail.com> wrote:

>
>
> On Tue, Jun 13, 2017 at 10:05 PM Peter Collingbourne <peter at pcc.me.uk>
> wrote:
>
>> On Tue, Jun 13, 2017 at 8:48 PM, David Blaikie <dblaikie at gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Tue, Jun 13, 2017 at 8:43 PM Peter Collingbourne <peter at pcc.me.uk>
>>> wrote:
>>>
>>>> On Tue, Jun 13, 2017 at 7:54 PM, David Blaikie <dblaikie at gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Tue, Jun 13, 2017 at 6:34 PM Peter Collingbourne via cfe-dev <
>>>>> cfe-dev at lists.llvm.org> wrote:
>>>>>
>>>>>> On Wed, Jun 7, 2017 at 11:28 PM, Peter Collingbourne <peter at pcc.me.uk
>>>>>> > wrote:
>>>>>>
>>>>>>> On Wed, Jun 7, 2017 at 8:06 PM, Sean Silva <chisophugis at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jun 7, 2017 at 4:31 PM, Peter Collingbourne <
>>>>>>>> peter at pcc.me.uk> wrote:
>>>>>>>>
>>>>>>>>> On Wed, Jun 7, 2017 at 12:17 AM, Sean Silva <chisophugis at gmail.com
>>>>>>>>> > wrote:
>>>>>>>>>
>>>>>>>>>> Very nice and simple implementation!
>>>>>>>>>>
>>>>>>>>>> Do you have any statistics on how large these odr tables are
>>>>>>>>>> compared to other object file data? I assume that if these tables contain
>>>>>>>>>> full mangled symbol names, they could end up being very large and may want
>>>>>>>>>> to share the symbol name strings with the overall string table in the .o
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Looking at Chromium's object files it looks like the total size of
>>>>>>>>> the odrtabs is about 50% of the total size of the object files, which isn't
>>>>>>>>> great. The current implementation only looks at records, so I imagine that
>>>>>>>>> it would be hard to share any of the strings that I'm currently creating.
>>>>>>>>> (I guess it's possible that some types will have a mangled vtable name in
>>>>>>>>> the string table, so we may be able to share a little that way.) Note
>>>>>>>>> however that this was without debug info.
>>>>>>>>>
>>>>>>>>> One option for reducing size would be to
>>>>>>>>> 1) store hashes of ODR names in ODR tables, per Rui's suggestion
>>>>>>>>> (alongside a reference to the name itself in the string table)
>>>>>>>>> 2) compress the string table for the ODR names with a standard
>>>>>>>>> compression algorithm like gzip.
>>>>>>>>> This wouldn't seem to affect link time performance much because I
>>>>>>>>> think we should only need to look at the strings if we see a ODR name hash
>>>>>>>>> match together with an ODR hash mismatch, which would mean an ODR violation
>>>>>>>>> with a high probability (i.e. unless there was an ODR name hash collision,
>>>>>>>>> we have found an ODR violation). If we don't expect a lot of sharing with
>>>>>>>>> regular string tables (see below), it seems even more reasonable.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Neat observation!
>>>>>>>>
>>>>>>>> FWIW, it is a birthday problem type situation though, so for a
>>>>>>>> 32-bit hash, we would expect a collision in about 1 in 2^16 distinct hashes
>>>>>>>> (and 2^16 seems pretty easy to hit in a large project). So 64-bit hashes
>>>>>>>> might be preferable.
>>>>>>>>
>>>>>>>
>>>>>>> Oh right, good point, using a 64-bit hash does seem like a good idea
>>>>>>> here.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>> Also, do you have any numbers on the performance of your initial
>>>>>>>>>> implementation?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I measured the link time for chromium's unit_tests (the largest
>>>>>>>>> single binary in chromium) at 5.05s without ODR checks and 6.61s with ODR
>>>>>>>>> checks. So about 30% overhead, but in absolute terms it doesn't seem too
>>>>>>>>> bad. So I think this may be acceptable for an initial implementation, but
>>>>>>>>> it certainly seems worth trying to do better.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I know that things aren't currently apples-to-apples, but how does
>>>>>>>> that compare to gold?
>>>>>>>>
>>>>>>>
>>>>>>> I will measure it.
>>>>>>>
>>>>>>
>>>>>> For that unit_tests binary I measured the overhead at about 5 seconds
>>>>>> (average of 10 runs). That is with debug info, of course.
>>>>>>
>>>>>> W.r.t. LLD and having it always on by default (and hence making it as
>>>>>>>>>> fast as possible), it seems like right now you are implementing the
>>>>>>>>>> checking process with a hash table. That's simple and fine for a first
>>>>>>>>>> implementation, but it's probably worth mentioning in a comment the problem
>>>>>>>>>> of checking the tables, at least from the linker's perspective, does fit
>>>>>>>>>> into a map-reduce pattern and could be easily parallelized if needed. E.g.
>>>>>>>>>> a parallel sort to coalesce all entries for symbols of the same name
>>>>>>>>>> followed by a parallel forEach to check each bucket with the same symbol
>>>>>>>>>> name (roughly speaking).
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Right, that's one approach. I was thinking of a simpler approach
>>>>>>>>> where at compile time we sort ODR names by hash and partition them using
>>>>>>>>> (say) the upper bits of the hash, so that at link time we can have N
>>>>>>>>> threads each building a hash table for a specific partition.
>>>>>>>>>
>>>>>>>>> And of course this work can be started right after symbol
>>>>>>>>> resolution finishes and parallelised with the rest of the work done by the
>>>>>>>>> linker.
>>>>>>>>>
>>>>>>>>> Even better than doing it faster is just doing less work. There's
>>>>>>>>>> a lot of work that the linker is already doing that may be reusable for the
>>>>>>>>>> ODR checking.
>>>>>>>>>> E.g.
>>>>>>>>>> - maybe we could get the coalescing step as a byproduct of our
>>>>>>>>>> existing string deduping, which we are generally doing anyway.
>>>>>>>>>> - we are already coalescing symbol names for the symbol table. If
>>>>>>>>>> the ODR table is keyed off of symbols in the binary that we are inserting
>>>>>>>>>> into the symbol table, then I think we could do the entire ODR check with
>>>>>>>>>> no extra "string" work on LLD's part.
>>>>>>>>>>
>>>>>>>>>> I see Rui already mentioned some of this in
>>>>>>>>>> https://bugs.chromium.org/p/chromium/issues/detail?id=726071#c4.
>>>>>>>>>> You mentioned that not everything is necessarily directly keyed
>>>>>>>>>> on a symbol (such as types), but I think that it would really simplify
>>>>>>>>>> things if the check was done as such. Do you have any idea exactly how much
>>>>>>>>>> of the things that we want to check are not keyed on symbols? If most
>>>>>>>>>> things are keyed on symbols, for the things we are not we can just emit
>>>>>>>>>> extra symbols prefixed by __clang_odr_check_ or whatever.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Since the current implementation only works with records there is
>>>>>>>>> basically zero overlap right now between ODR names and symbols. I suppose
>>>>>>>>> that I could estimate the amount of function overlap in a hypothetical
>>>>>>>>> implementation that computes ODR hashes of functions by comparing the
>>>>>>>>> number of *_odr functions after clang has finished IRgen with the number
>>>>>>>>> after optimization finishes. This of course would be strictly more than
>>>>>>>>> functions + types.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Wouldn't any function or symbol using the record type have the type
>>>>>>>> name somewhere in it? If we used an offset+length encoding (instead of
>>>>>>>> offset + NUL termination) we might be able to reuse it then (at some cost
>>>>>>>> in finding the reference).
>>>>>>>>
>>>>>>>
>>>>>>> That may be possible with some work in the string table builder. But
>>>>>>> at that point of course we're not dealing with regular symbols any more. I
>>>>>>> guess we could have two ODR tables per object file: an array of (ODR hash,
>>>>>>> location) tuples for ODR names that correspond to symbol table symbols
>>>>>>> (i.e. Rui's proposal on the chromium bug), and an array of (ODR name, ODR
>>>>>>> hash, location) tuples for all other ODR names. I guess if we wanted a "low
>>>>>>> overhead" mode we could just omit the second table or put fewer symbols in
>>>>>>> it.
>>>>>>>
>>>>>>> With debug info surely there is some sort of string representing the
>>>>>>>> record name or something like that, no?
>>>>>>>>
>>>>>>>
>>>>>>> Not the record name on its own (they do appear but a bit awkwardly
>>>>>>> -- each namespace component is stored in a separate string), but if the
>>>>>>> record has at least one member function the mangled type name will appear
>>>>>>> somewhere in .debug_str, so we could in principle reuse that with the
>>>>>>> offset/length trick.
>>>>>>>
>>>>>>> I guess we may have to have our "low-overhead" user-facing behavior
>>>>>>>> be a bit more nuanced. E.g.:
>>>>>>>> 1. does this feature bloat object files significantly
>>>>>>>> 2. does this feature slow down link times significantly
>>>>>>>>
>>>>>>>> Intuitively, it seems like we should be able to get 1. when debug
>>>>>>>> info happens to be enabled (not sure about split dwarf?) and possibly in
>>>>>>>> all cases at the cost of complexity. We may be able to get 2. in all cases
>>>>>>>> with proper design.
>>>>>>>>
>>>>>>>
>>>>>>> I think that would be my rough assessment as well. I think we have a
>>>>>>> good shot at 1 for all cases with some of the ideas that have been
>>>>>>> mentioned already. If we can avoid creating dependencies on DWARF I think
>>>>>>> that would be ideal -- I'd ideally like this to work for COFF as well,
>>>>>>> where you'd typically expect to find CodeView in object files. If I were to
>>>>>>> try this I think the first thing that I would try is hash/compression
>>>>>>> combined with the two ODR tables (no reuse for non-symbol ODR names to
>>>>>>> start with, as compression may be enough on its own).
>>>>>>>
>>>>>>
>>>>>> I developed a second prototype which uses hash/compression with no
>>>>>> attempt to reuse. It is available here: https://github.com/pcc/
>>>>>> llvm-project/tree/odr-checker2
>>>>>>
>>>>>> For Chromium the object file size overhead was 536566007 bytes, or in
>>>>>> relative terms about 25%, or about 4% with debug info. I measured perf
>>>>>> overhead for unit_tests at about 6%, but after I moved the checker onto
>>>>>> another thread, the overhead disappeared into the noise.
>>>>>>
>>>>>
>>>>> Still seems like quite a big increase.
>>>>>
>>>>> Any chance of compression?
>>>>>
>>>>
>>>> That was with compression -- the implementation compresses the parts of
>>>> the ODR table that aren't hashes (aside from the header and the Clang
>>>> version, which is a small fixed cost), as well as the string table. The
>>>> hashes were left uncompressed because they are in the critical path of the
>>>> linker and because I imagine that they wouldn't really be that compressible.
>>>>
>>>
>>> I'd be a bit surprised if they weren't especially compressible -
>>>
>>
>> Maybe I'm wrong, but my intuition about compression is that it works best
>> when the data contains repeated patterns. If we use a hash function with
>> good dispersion then I'd expect each hash to have little in common with
>> other hashes.
>>
>>
>>> and how much of the size increase is the compressed data V the
>>> uncompressed data?
>>>
>>
>> The ratio was roughly 60% compressed data to 40% uncompressed data.
>>
>>
>>> Is it still in the hot path when parallelized?
>>>
>>
>> Not right now according to my benchmarking, but decompression could push
>> it into the critical path if it ends up taking longer than the rest of the
>> work done by the linker after symbol resolution. On the same machine that I
>> used for benchmarking, gunzip'ing 200MB of /dev/urandom (which is roughly
>> what I'd expect the hashes to look like) takes around 1.1s, i.e. a not
>> insignificant fraction of lld's runtime.
>>
>>
>>> So I think the remaining gains would either be through limiting the
>>>> number of ODR table entries, or through reuse of data.
>>>>
>>>> Limiting might be something to explore -- one possibility is that we
>>>> could limit the ODR table entries to the declarations that are "used" by a
>>>> particular translation unit (it appears that Clang tracks something like
>>>> that in Decl::Used/Decl::Referenced, but I'm not sure if that is exactly
>>>> what we need -- I think we would basically need to test for reference
>>>> reachability from the functions/globals that are IRgen'd).
>>>>
>>>
>>> Currently it has every type and function that is in the AST? Yeah,
>>> that's a lot - perhaps it should be more like the things that go in the
>>> DWARF? (though would need to add some cases there - since the DWARF logic
>>> already relies on the ODR to not emit duplicates in some cases)
>>>
>>
>> Just every record declaration -- Clang only supports ODR hashes for
>> record declarations right now. I understand that function declarations
>> (including function bodies) are still works in progress.
>>
>> I think it should indeed just be roughly the things that go in the DWARF.
>> I think that at one point I observed that every record declaration, even
>> unused ones, were going into the DWARF, but I might have been mistaken
>> because I can no longer reproduce that. I'll take a closer look to see if I
>> can reuse what logic presumably already exists for DWARF.
>>
>> In terms of reuse, it seems that of the 536566007 bytes of
>>>> overhead, 319309579 were the compressed part of the ODR tables. So even if
>>>> we achieved 100% sharing,
>>>>
>>>
>>> 100% sharing? You mean if all the data were compressed, and assuming the
>>> hashes were compressible at the same ratio as the other data?
>>>
>>
>> Sorry, I mean if 100% of the data in the compressed part of the ODR table
>> could be eliminated by reusing data stored elsewhere (e.g. in the object
>> file string table or in the DWARF).
>>
>> with the current scheme I think that our minimum achievable overhead
>>>> would be ~15% (no debug info) or ~2% (with debug info).
>>>>
>>>>
>>>>> Could this go into .dwo files with Fission and be checked by dwp
>>>>> instead, perhaps?
>>>>>
>>>>
>>>> I think it could also work that way, yes.
>>>>
>>>> I'm reasonably happy with these figures, at least for a first
>>>>>> implementation. We may be able to do even better for file size with reuse,
>>>>>> but I'd leave that for version 2.
>>>>>>
>>>>>
>>>>> What's the story with compatibility between versions, then? Is there a
>>>>> version header?
>>>>>
>>>>
>>>> Yes, the header contains a version number.
>>>>
>>>>
>>>>> Will old formats be supported by lld indefinitely? Not at all?
>>>>>
>>>>
>>>> I think we should drop support for old formats when we introduce a new
>>>> format. My understanding is that the ODR hash can change whenever Clang
>>>> changes (the implementation only performs ODR checking if all ODR tables
>>>> were produced by the same revision of Clang), so there wouldn't seem to be
>>>> a huge benefit in keeping support for old formats around.
>>>>
>>>
>>> I imagine it's possible people aren't necessarily going to rev lld in
>>> exact lock-step with clang, but I could be wrong. (certainly binutils ld or
>>> gold aren't released/kept in lock-step with GCC, for example)
>>>
>>
>> That's certainly possible, but I'd say that the bar for dropping
>> backwards compatibility is lower because ODR tables are not required for
>> correctness. We could keep compatibility with the last version or so if it
>> isn't too burdensome, or otherwise print a warning.
>>
>
> They aren't required for correctness, but upgrading your compiler or
> linker then silently losing ODR checking would be bad (or even not silently
> losing it, but having no choice but to rev both to keep the functionality &
> hold the ODR-cleanliness bar) - it's the sort of thing where if you lost
> the checking, then gained it back again later, the regression cleanup would
> be annoying/an impediment to using the feature.
>

Makes sense I guess. I'd be fine with a policy where the Nth open source
release should be able to read ODR tables produced by the N-1th and
possibly the N-2th release.

Any idea what Daniel Jasper & co have been working on WRT ODR checking &
> how this feature integrates or doesn't with their work? I imagine they
> might be working on something more like a Clang Tooling style approach, but
> I'm not sure.
>

I'm not aware of any work like that, only of Richard Trieu's efforts for
modules that I'm piggybacking on.

Peter

>
> - Dave
>
>
>>
>> Peter
>>
>>>
>>>
>>>>
>>>> Peter
>>>>
>>>>>
>>>>> - Dave
>>>>>
>>>>>
>>>>>>
>>>>>> Peter
>>>>>>
>>>>>>>
>>>>>>> Peter
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> -- Sean Silva
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> The issue of retaining the ODR check for functions even if they
>>>>>>>>>> get inlined may inherently pose an extra cost that can't be folded into
>>>>>>>>>> existing work the linker is doing, so there might be a reason for clang to
>>>>>>>>>> have a default mode that has practically no linking overhead and one that
>>>>>>>>>> does more thorough checking but imposes extra linking overhead. Think
>>>>>>>>>> something like a crazy boost library with thousands of functions that get
>>>>>>>>>> inlined away, but have gigantic mangled names and so precisely are the ones
>>>>>>>>>> that are going to impose extra cost on the linker. Simply due to the extra
>>>>>>>>>> volume of strings that the linker would need to look at, I don't think
>>>>>>>>>> there's a way to include checking of all inlined function "for free" at the
>>>>>>>>>> linker level using the symbol approach.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> I guess those inlined functions would still have those symbol
>>>>>>>>>> names in debug info (I think?), so piggybacking on the string deduplication
>>>>>>>>>> we're already doing might make it possible to fold away the work in that
>>>>>>>>>> case (but then again, would still impose extra cost with split dwarf...).
>>>>>>>>>>
>>>>>>>>>> Anyway, let's wait to see what the actual performance numbers are.
>>>>>>>>>>
>>>>>>>>>> -- Sean Silva
>>>>>>>>>>
>>>>>>>>>> On Tue, Jun 6, 2017 at 10:40 PM, Peter Collingbourne via cfe-dev
>>>>>>>>>> <cfe-dev at lists.llvm.org> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> I'd like to propose an ODR checker feature for Clang and LLD.
>>>>>>>>>>> The feature would be similar to gold's --detect-odr-violations feature, but
>>>>>>>>>>> better: we can rely on integration with clang to avoid relying on debug
>>>>>>>>>>> info and to perform more precise matching.
>>>>>>>>>>>
>>>>>>>>>>> The basic idea is that we use clang's ability to create ODR
>>>>>>>>>>> hashes for declarations. ODR hashes are computed using all information
>>>>>>>>>>> about a declaration that is ODR-relevant. If the flag
>>>>>>>>>>> -fdetect-odr-violations is passed, Clang will store the ODR hashes in a
>>>>>>>>>>> so-called ODR table in each object file. Each ODR table will contain a
>>>>>>>>>>> mapping from mangled declaration names to ODR hashes. At link time, the
>>>>>>>>>>> linker will read the ODR table and report any mismatches.
>>>>>>>>>>>
>>>>>>>>>>> To make this work:
>>>>>>>>>>> - LLVM will be extended with the ability to represent ODR tables
>>>>>>>>>>> in the IR and emit them to object files
>>>>>>>>>>> - Clang will be extended with the ability to emit ODR tables
>>>>>>>>>>> using ODR hashes
>>>>>>>>>>> - LLD will be extended to read ODR tables from object files
>>>>>>>>>>>
>>>>>>>>>>> I have implemented a prototype of this feature. It is available
>>>>>>>>>>> here: https://github.com/pcc/llvm-project/tree/odr-checker and
>>>>>>>>>>> some results from applying it to chromium are here:
>>>>>>>>>>> crbug.com/726071
>>>>>>>>>>> As you can see it did indeed find a number of real ODR
>>>>>>>>>>> violations in Chromium, including some that wouldn't be detectable using
>>>>>>>>>>> debug info.
>>>>>>>>>>>
>>>>>>>>>>> If you're interested in what the format of the ODR table would
>>>>>>>>>>> look like, that prototype shows pretty much what I had in mind, but I
>>>>>>>>>>> expect many other aspects of the implementation to change as it is
>>>>>>>>>>> upstreamed.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> --
>>>>>>>>>>> --
>>>>>>>>>>> Peter
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> cfe-dev mailing list
>>>>>>>>>>> cfe-dev at lists.llvm.org
>>>>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> --
>>>>>>>>> Peter
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> --
>>>>>>> Peter
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> --
>>>>>> Peter
>>>>>> _______________________________________________
>>>>>> cfe-dev mailing list
>>>>>> cfe-dev at lists.llvm.org
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> --
>>>> Peter
>>>>
>>>
>>
>>
>> --
>> --
>> Peter
>>
>

-- 
-- 
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20170613/b57fa5b5/attachment.html>