[llvm-dev] [cfe-dev] RFC: ODR checker for Clang and LLD
Sean Silva via llvm-dev
llvm-dev at lists.llvm.org
Wed Jun 7 00:17:00 PDT 2017
Very nice and simple implementation!
Do you have any statistics on how large these odr tables are compared to
other object file data? I assume that if these tables contain full mangled
symbol names, they could end up being very large and may want to share the
symbol name strings with the overall string table in the .o
Also, do you have any numbers on the performance of your initial
implementation?
W.r.t. LLD and having it always on by default (and hence making it as fast
as possible), it seems like right now you are implementing the checking
process with a hash table. That's simple and fine for a first
implementation, but it's probably worth mentioning in a comment the problem
of checking the tables, at least from the linker's perspective, does fit
into a map-reduce pattern and could be easily parallelized if needed. E.g.
a parallel sort to coalesce all entries for symbols of the same name
followed by a parallel forEach to check each bucket with the same symbol
name (roughly speaking).
Even better than doing it faster is just doing less work. There's a lot of
work that the linker is already doing that may be reusable for the ODR
checking.
E.g.
- maybe we could get the coalescing step as a byproduct of our existing
string deduping, which we are generally doing anyway.
- we are already coalescing symbol names for the symbol table. If the ODR
table is keyed off of symbols in the binary that we are inserting into the
symbol table, then I think we could do the entire ODR check with no extra
"string" work on LLD's part.
I see Rui already mentioned some of this in
https://bugs.chromium.org/p/chromium/issues/detail?id=726071#c4.
You mentioned that not everything is necessarily directly keyed on a symbol
(such as types), but I think that it would really simplify things if the
check was done as such. Do you have any idea exactly how much of the things
that we want to check are not keyed on symbols? If most things are keyed on
symbols, for the things we are not we can just emit extra symbols prefixed
by __clang_odr_check_ or whatever.
The issue of retaining the ODR check for functions even if they get inlined
may inherently pose an extra cost that can't be folded into existing work
the linker is doing, so there might be a reason for clang to have a default
mode that has practically no linking overhead and one that does more
thorough checking but imposes extra linking overhead. Think something like
a crazy boost library with thousands of functions that get inlined away,
but have gigantic mangled names and so precisely are the ones that are
going to impose extra cost on the linker. Simply due to the extra volume of
strings that the linker would need to look at, I don't think there's a way
to include checking of all inlined function "for free" at the linker level
using the symbol approach.
I guess those inlined functions would still have those symbol names in
debug info (I think?), so piggybacking on the string deduplication we're
already doing might make it possible to fold away the work in that case
(but then again, would still impose extra cost with split dwarf...).
Anyway, let's wait to see what the actual performance numbers are.
-- Sean Silva
On Tue, Jun 6, 2017 at 10:40 PM, Peter Collingbourne via cfe-dev <
cfe-dev at lists.llvm.org> wrote:
> Hi all,
>
> I'd like to propose an ODR checker feature for Clang and LLD. The feature
> would be similar to gold's --detect-odr-violations feature, but better: we
> can rely on integration with clang to avoid relying on debug info and to
> perform more precise matching.
>
> The basic idea is that we use clang's ability to create ODR hashes for
> declarations. ODR hashes are computed using all information about a
> declaration that is ODR-relevant. If the flag -fdetect-odr-violations is
> passed, Clang will store the ODR hashes in a so-called ODR table in each
> object file. Each ODR table will contain a mapping from mangled declaration
> names to ODR hashes. At link time, the linker will read the ODR table and
> report any mismatches.
>
> To make this work:
> - LLVM will be extended with the ability to represent ODR tables in the IR
> and emit them to object files
> - Clang will be extended with the ability to emit ODR tables using ODR
> hashes
> - LLD will be extended to read ODR tables from object files
>
> I have implemented a prototype of this feature. It is available here:
> https://github.com/pcc/llvm-project/tree/odr-checker and some results
> from applying it to chromium are here: crbug.com/726071
> As you can see it did indeed find a number of real ODR violations in
> Chromium, including some that wouldn't be detectable using debug info.
>
> If you're interested in what the format of the ODR table would look like,
> that prototype shows pretty much what I had in mind, but I expect many
> other aspects of the implementation to change as it is upstreamed.
>
> Thanks,
> --
> --
> Peter
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170607/dfb2e93f/attachment.html>
More information about the llvm-dev
mailing list