<div dir="ltr">I think you are right -- I always confuse false positives with false negatives. What I wanted to say is collisions increase a chance that an ODR violation is not detected.</div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jun 14, 2017 at 6:15 PM, David Blaikie <span dir="ltr"><<a href="mailto:dblaikie@gmail.com" target="_blank">dblaikie@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><br><div class="gmail_quote"><span class=""><div dir="ltr">On Wed, Jun 14, 2017 at 6:01 PM Rui Ueyama <<a href="mailto:ruiu@google.com" target="_blank">ruiu@google.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Is the entry in your ODR table 64-bit? Sean mentioned that this is a birthday paradox situation, but I don't think we need that large hash values, as our aim is not to avoid any collisions. Small number of collisions is okay as it just slightly increases false negatives. I think it can even be 16-bit if space saving is important. If we choose 16-bit hash, the probability that an ODR violation is not detected is 1/65536, which is still quite low.</div></blockquote></span><div><br>If I'm understanding this correctly, it's the opposite though - colliding hashes will produce false positives, right? (ODR violations will be detected where none exist in the code) Perhaps I'm misunderstanding.<br> </div><div><div class="h5"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jun 14, 2017 at 1:41 PM, Peter Collingbourne via cfe-dev <span dir="ltr"><<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><div><div class="m_8536049278293863233m_-805440991100361147h5">On Wed, Jun 14, 2017 at 12:47 PM, David Blaikie <span dir="ltr"><<a href="mailto:dblaikie@gmail.com" target="_blank">dblaikie@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="m_8536049278293863233m_-805440991100361147m_6788176004464083424gmail-HOEnZb"><div class="m_8536049278293863233m_-805440991100361147m_6788176004464083424gmail-h5"><br><br><div class="gmail_quote"><div dir="ltr">On Tue, Jun 13, 2017, 11:30 PM Peter Collingbourne <<a href="mailto:peter@pcc.me.uk" target="_blank">peter@pcc.me.uk</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Tue, Jun 13, 2017 at 11:06 PM, David Blaikie <span dir="ltr"><<a href="mailto:dblaikie@gmail.com" target="_blank">dblaikie@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><br><br><div class="gmail_quote"><div><div class="m_8536049278293863233m_-805440991100361147m_6788176004464083424gmail-m_2484608349038262809m_-378079782347931555h5"><div dir="ltr">On Tue, Jun 13, 2017 at 10:05 PM Peter Collingbourne <<a href="mailto:peter@pcc.me.uk" target="_blank">peter@pcc.me.uk</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Tue, Jun 13, 2017 at 8:48 PM, David Blaikie <span dir="ltr"><<a href="mailto:dblaikie@gmail.com" target="_blank">dblaikie@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><br><br><div class="gmail_quote"><div><div class="m_8536049278293863233m_-805440991100361147m_6788176004464083424gmail-m_2484608349038262809m_-378079782347931555m_4730558371586502940m_-6648208884759771566gmail-h5"><div dir="ltr">On Tue, Jun 13, 2017 at 8:43 PM Peter Collingbourne <<a href="mailto:peter@pcc.me.uk" target="_blank">peter@pcc.me.uk</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Tue, Jun 13, 2017 at 7:54 PM, David Blaikie <span dir="ltr"><<a href="mailto:dblaikie@gmail.com" target="_blank">dblaikie@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><br><br><div class="gmail_quote"><div><div class="m_8536049278293863233m_-805440991100361147m_6788176004464083424gmail-m_2484608349038262809m_-378079782347931555m_4730558371586502940m_-6648208884759771566gmail-m_-1917046612426427630m_8070089485785554164gmail-h5"><div dir="ltr">On Tue, Jun 13, 2017 at 6:34 PM Peter Collingbourne via cfe-dev <<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Wed, Jun 7, 2017 at 11:28 PM, Peter Collingbourne <span dir="ltr"><<a href="mailto:peter@pcc.me.uk" target="_blank">peter@pcc.me.uk</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span>On Wed, Jun 7, 2017 at 8:06 PM, Sean Silva <span dir="ltr"><<a href="mailto:chisophugis@gmail.com" target="_blank">chisophugis@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><span>On Wed, Jun 7, 2017 at 4:31 PM, Peter Collingbourne <span dir="ltr"><<a href="mailto:peter@pcc.me.uk" target="_blank">peter@pcc.me.uk</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span>On Wed, Jun 7, 2017 at 12:17 AM, Sean Silva <span dir="ltr"><<a href="mailto:chisophugis@gmail.com" target="_blank">chisophugis@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Very nice and simple implementation!<div><br></div><div>Do you have any statistics on how large these odr tables are compared to other object file data? I assume that if these tables contain full mangled symbol names, they could end up being very large and may want to share the symbol name strings with the overall string table in the .o</div></div></blockquote><div><br></div></span><div>Looking at Chromium's object files it looks like the total size of the odrtabs is about 50% of the total size of the object files, which isn't great. The current implementation only looks at records, so I imagine that it would be hard to share any of the strings that I'm currently creating. (I guess it's possible that some types will have a mangled vtable name in the string table, so we may be able to share a little that way.) Note however that this was without debug info.</div><div><br></div><div>One option for reducing size would be to</div><div>1) store hashes of ODR names in ODR tables, per Rui's suggestion (alongside a reference to the name itself in the string table)</div><div>2) compress the string table for the ODR names with a standard compression algorithm like gzip.</div><div>This wouldn't seem to affect link time performance much because I think we should only need to look at the strings if we see a ODR name hash match together with an ODR hash mismatch, which would mean an ODR violation with a high probability (i.e. unless there was an ODR name hash collision, we have found an ODR violation). If we don't expect a lot of sharing with regular string tables (see below), it seems even more reasonable.</div></div></div></div></blockquote><div><br></div></span><div>Neat observation!</div><div><br></div><div>FWIW, it is a birthday problem type situation though, so for a 32-bit hash, we would expect a collision in about 1 in 2^16 distinct hashes (and 2^16 seems pretty easy to hit in a large project). So 64-bit hashes might be preferable.</div></div></div></div></blockquote><div><br></div></span><div>Oh right, good point, using a 64-bit hash does seem like a good idea here.</div><span><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Also, do you have any numbers on the performance of your initial implementation?</div></div></blockquote><div><br></div></span><div>I measured the link time for chromium's unit_tests (the largest single binary in chromium) at 5.05s without ODR checks and 6.61s with ODR checks. So about 30% overhead, but in absolute terms it doesn't seem too bad. So I think this may be acceptable for an initial implementation, but it certainly seems worth trying to do better.</div></div></div></div></blockquote><div><br></div></span><div>I know that things aren't currently apples-to-apples, but how does that compare to gold?</div></div></div></div></blockquote><div><br></div></span><div>I will measure it.</div></div></div></div></blockquote><div><br></div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>For that unit_tests binary I measured the overhead at about 5 seconds (average of 10 runs). That is with debug info, of course.</div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>W.r.t. LLD and having it always on by default (and hence making it as fast as possible), it seems like right now you are implementing the checking process with a hash table. That's simple and fine for a first implementation, but it's probably worth mentioning in a comment the problem of checking the tables, at least from the linker's perspective, does fit into a map-reduce pattern and could be easily parallelized if needed. E.g. a parallel sort to coalesce all entries for symbols of the same name followed by a parallel forEach to check each bucket with the same symbol name (roughly speaking).</div></div></blockquote><div><br></div></span><div>Right, that's one approach. I was thinking of a simpler approach where at compile time we sort ODR names by hash and partition them using (say) the upper bits of the hash, so that at link time we can have N threads each building a hash table for a specific partition.</div><div><br></div><div>And of course this work can be started right after symbol resolution finishes and parallelised with the rest of the work done by the linker.</div><span><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Even better than doing it faster is just doing less work. There's a lot of work that the linker is already doing that may be reusable for the ODR checking.</div><div>E.g.</div><div>- maybe we could get the coalescing step as a byproduct of our existing string deduping, which we are generally doing anyway.</div><div>- we are already coalescing symbol names for the symbol table. If the ODR table is keyed off of symbols in the binary that we are inserting into the symbol table, then I think we could do the entire ODR check with no extra "string" work on LLD's part.</div><div><br></div><div>I see Rui already mentioned some of this in <a href="https://bugs.chromium.org/p/chromium/issues/detail?id=726071#c4" target="_blank">https://bugs.chromium.org/<wbr>p/chromium/issues/detail?id=<wbr>726071#c4</a>.</div><div>You mentioned that not everything is necessarily directly keyed on a symbol (such as types), but I think that it would really simplify things if the check was done as such. Do you have any idea exactly how much of the things that we want to check are not keyed on symbols? If most things are keyed on symbols, for the things we are not we can just emit extra symbols prefixed by __clang_odr_check_ or whatever.</div></div></blockquote><div><br></div></span><div>Since the current implementation only works with records there is basically zero overlap right now between ODR names and symbols. I suppose that I could estimate the amount of function overlap in a hypothetical implementation that computes ODR hashes of functions by comparing the number of *_odr functions after clang has finished IRgen with the number after optimization finishes. This of course would be strictly more than functions + types.</div></div></div></div></blockquote><div><br></div></span><div>Wouldn't any function or symbol using the record type have the type name somewhere in it? If we used an offset+length encoding (instead of offset + NUL termination) we might be able to reuse it then (at some cost in finding the reference).</div></div></div></div></blockquote><div><br></div></span><div>That may be possible with some work in the string table builder. But at that point of course we're not dealing with regular symbols any more. I guess we could have two ODR tables per object file: an array of (ODR hash, location) tuples for ODR names that correspond to symbol table symbols (i.e. Rui's proposal on the chromium bug), and an array of (ODR name, ODR hash, location) tuples for all other ODR names. I guess if we wanted a "low overhead" mode we could just omit the second table or put fewer symbols in it.<br></div><span><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>With debug info surely there is some sort of string representing the record name or something like that, no?</div></div></div></div></blockquote><div><br></div></span><div>Not the record name on its own (they do appear but a bit awkwardly -- each namespace component is stored in a separate string), but if the record has at least one member function the mangled type name will appear somewhere in .debug_str, so we could in principle reuse that with the offset/length trick.</div><span><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>I guess we may have to have our "low-overhead" user-facing behavior be a bit more nuanced. E.g.:</div><div>1. does this feature bloat object files significantly</div><div>2. does this feature slow down link times significantly</div><div><br></div><div>Intuitively, it seems like we should be able to get 1. when debug info happens to be enabled (not sure about split dwarf?) and possibly in all cases at the cost of complexity. We may be able to get 2. in all cases with proper design.</div></div></div></div></blockquote><div><br></div></span><div>I think that would be my rough assessment as well. I think we have a good shot at 1 for all cases with some of the ideas that have been mentioned already. If we can avoid creating dependencies on DWARF I think that would be ideal -- I'd ideally like this to work for COFF as well, where you'd typically expect to find CodeView in object files. If I were to try this I think the first thing that I would try is hash/compression combined with the two ODR tables (no reuse for non-symbol ODR names to start with, as compression may be enough on its own).</div></div></div></div></blockquote><div><br></div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>I developed a second prototype which uses hash/compression with no attempt to reuse. It is available here: <a href="https://github.com/pcc/llvm-project/tree/odr-checker2" target="_blank">https://github.com/pcc/<wbr>llvm-project/tree/odr-checker2</a></div><div><br></div><div>For Chromium the object file size overhead was 536566007 bytes, or in relative terms about 25%, or about 4% with debug info. I measured perf overhead for unit_tests at about 6%, but after I moved the checker onto another thread, the overhead disappeared into the noise.</div></div></div></div></blockquote><div><br></div></div></div><div>Still seems like quite a big increase.<br><br>Any chance of compression?</div></div></div></blockquote><div><br></div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>That was with compression -- the implementation compresses the parts of the ODR table that aren't hashes (aside from the header and the Clang version, which is a small fixed cost), as well as the string table. The hashes were left uncompressed because they are in the critical path of the linker and because I imagine that they wouldn't really be that compressible.</div></div></div></div></blockquote></div></div><div><br>I'd be a bit surprised if they weren't especially compressible - </div></div></div></blockquote><div><br></div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>Maybe I'm wrong, but my intuition about compression is that it works best when the data contains repeated patterns. If we use a hash function with good dispersion then I'd expect each hash to have little in common with other hashes.</div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div>and how much of the size increase is the compressed data V the uncompressed data?</div></div></div></blockquote><div><br></div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>The ratio was roughly 60% compressed data to 40% uncompressed data.</div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div>Is it still in the hot path when parallelized?<br></div></div></div></blockquote><div><br></div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>Not right now according to my benchmarking, but decompression could push it into the critical path if it ends up taking longer than the rest of the work done by the linker after symbol resolution. On the same machine that I used for benchmarking, gunzip'ing 200MB of /dev/urandom (which is roughly what I'd expect the hashes to look like) takes around 1.1s, i.e. a not insignificant fraction of lld's runtime.</div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><span class="m_8536049278293863233m_-805440991100361147m_6788176004464083424gmail-m_2484608349038262809m_-378079782347931555m_4730558371586502940m_-6648208884759771566gmail-"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>So I think the remaining gains would either be through limiting the number of ODR table entries, or through reuse of data.<br></div><div><br></div><div>Limiting might be something to explore -- one possibility is that we could limit the ODR table entries to the declarations that are "used" by a particular translation unit (it appears that Clang tracks something like that in Decl::Used/Decl::Referenced, but I'm not sure if that is exactly what we need -- I think we would basically need to test for reference reachability from the functions/globals that are IRgen'd).</div></div></div></div></blockquote></span><div><br>Currently it has every type and function that is in the AST? Yeah, that's a lot - perhaps it should be more like the things that go in the DWARF? (though would need to add some cases there - since the DWARF logic already relies on the ODR to not emit duplicates in some cases)<br></div></div></div></blockquote><div><br></div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>Just every record declaration -- Clang only supports ODR hashes for record declarations right now. I understand that function declarations (including function bodies) are still works in progress.</div><div><br></div><div>I think it should indeed just be roughly the things that go in the DWARF. I think that at one point I observed that every record declaration, even unused ones, were going into the DWARF, but I might have been mistaken because I can no longer reproduce that. I'll take a closer look to see if I can reuse what logic presumably already exists for DWARF.</div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div></div><span class="m_8536049278293863233m_-805440991100361147m_6788176004464083424gmail-m_2484608349038262809m_-378079782347931555m_4730558371586502940m_-6648208884759771566gmail-"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>In terms of reuse, it seems that of the 536566007 bytes of overhead, 319309579 were the compressed part of the ODR tables. So even if we achieved 100% sharing, </div></div></div></div></blockquote></span><div><br>100% sharing? You mean if all the data were compressed, and assuming the hashes were compressible at the same ratio as the other data?<br></div></div></div></blockquote><div><br></div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>Sorry, I mean if 100% of the data in the compressed part of the ODR table could be eliminated by reusing data stored elsewhere (e.g. in the object file string table or in the DWARF).</div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><span class="m_8536049278293863233m_-805440991100361147m_6788176004464083424gmail-m_2484608349038262809m_-378079782347931555m_4730558371586502940m_-6648208884759771566gmail-"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>with the current scheme I think that our minimum achievable overhead would be ~15% (no debug info) or ~2% (with debug info).<br></div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div>Could this go into .dwo files with Fission and be checked by dwp instead, perhaps?</div></div></div></blockquote><div><br></div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>I think it could also work that way, yes.</div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><span class="m_8536049278293863233m_-805440991100361147m_6788176004464083424gmail-m_2484608349038262809m_-378079782347931555m_4730558371586502940m_-6648208884759771566gmail-m_-1917046612426427630m_8070089485785554164gmail-"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>I'm reasonably happy with these figures, at least for a first implementation. We may be able to do even better for file size with reuse, but I'd leave that for version 2.<br></div></div></div></div></blockquote></span><div><br>What's the story with compatibility between versions, then? Is there a version header?</div></div></div></blockquote><div><br></div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>Yes, the header contains a version number.</div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div>Will old formats be supported by lld indefinitely? Not at all?<br></div></div></div></blockquote><div><br></div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>I think we should drop support for old formats when we introduce a new format. My understanding is that the ODR hash can change whenever Clang changes (the implementation only performs ODR checking if all ODR tables were produced by the same revision of Clang), so there wouldn't seem to be a huge benefit in keeping support for old formats around.</div></div></div></div></blockquote></span><div><br>I imagine it's possible people aren't necessarily going to rev lld in exact lock-step with clang, but I could be wrong. (certainly binutils ld or gold aren't released/kept in lock-step with GCC, for example)<br></div></div></div></blockquote><div><br></div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>That's certainly possible, but I'd say that the bar for dropping backwards compatibility is lower because ODR tables are not required for correctness. We could keep compatibility with the last version or so if it isn't too burdensome, or otherwise print a warning.<br></div></div></div></div></blockquote></div></div><div><br>They aren't required for correctness, but upgrading your compiler or linker then silently losing ODR checking would be bad (or even not silently losing it, but having no choice but to rev both to keep the functionality & hold the ODR-cleanliness bar) - it's the sort of thing where if you lost the checking, then gained it back again later, the regression cleanup would be annoying/an impediment to using the feature.<br></div></div></div></blockquote><div><br></div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>Makes sense I guess. I'd be fine with a policy where the Nth open source release should be able to read ODR tables produced by the N-1th and possibly the N-2th release.</div></div></div></div></blockquote></div><div><br></div></div></div><div>Still strikes me as a bit awkward - wonder how that compared to other (similar or different) linker features.</div></blockquote><div><br></div></div></div><div>I think the most similar existing feature is .gdb_index. They have already gone through a few format revisions:</div><div><a href="https://sourceware.org/gdb/onlinedocs/gdb/Index-Section-Format.html" target="_blank">https://sourceware.org/gdb/<wbr>onlinedocs/gdb/Index-Section-<wbr>Format.html</a></div><div>and have deprecated/removed support for older formats.</div><div><br></div><div>Because the requirements for ODR tables are simpler than those for .gdb_index, I'd expect us to converge on a final format sooner, so in practice the window of compatibility would end up being longer than a year.</div><div><br></div><div>Peter</div><div><div class="m_8536049278293863233m_-805440991100361147h5"><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="m_8536049278293863233m_-805440991100361147m_6788176004464083424gmail-"><div><br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div>Any idea what Daniel Jasper & co have been working on WRT ODR checking & how this feature integrates or doesn't with their work? I imagine they might be working on something more like a Clang Tooling style approach, but I'm not sure.<br></div></div></div></blockquote><div><br></div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>I'm not aware of any work like that, only of Richard Trieu's efforts for modules that I'm piggybacking on.</div></div></div></div></blockquote></div><div><br></div></span><div>+Djasper - perhaps you could provide some context on other odr detection efforts?</div><div class="m_8536049278293863233m_-805440991100361147m_6788176004464083424gmail-HOEnZb"><div class="m_8536049278293863233m_-805440991100361147m_6788176004464083424gmail-h5"><div><br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div><div>Peter </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div><br>- Dave<br> </div><div><div class="m_8536049278293863233m_-805440991100361147m_6788176004464083424gmail-m_2484608349038262809m_-378079782347931555h5"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div></div><div><br></div><div>Peter</div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div> </div><div><div class="m_8536049278293863233m_-805440991100361147m_6788176004464083424gmail-m_2484608349038262809m_-378079782347931555m_4730558371586502940m_-6648208884759771566gmail-h5"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div><div>Peter</div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div><br>- Dave<br> </div><div><div class="m_8536049278293863233m_-805440991100361147m_6788176004464083424gmail-m_2484608349038262809m_-378079782347931555m_4730558371586502940m_-6648208884759771566gmail-m_-1917046612426427630m_8070089485785554164gmail-h5"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div></div><div><br></div><div>Peter</div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div><div>Peter</div><div><div class="m_8536049278293863233m_-805440991100361147m_6788176004464083424gmail-m_2484608349038262809m_-378079782347931555m_4730558371586502940m_-6648208884759771566gmail-m_-1917046612426427630m_8070089485785554164gmail-m_-8900455223397322873m_-1163800682136724833m_-2840144985340265864gmail-m_-7838184213999349599m_4000392590411266320m_2161214087162447199gmail-m_3832867300096285h5"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span class="m_8536049278293863233m_-805440991100361147m_6788176004464083424gmail-m_2484608349038262809m_-378079782347931555m_4730558371586502940m_-6648208884759771566gmail-m_-1917046612426427630m_8070089485785554164gmail-m_-8900455223397322873m_-1163800682136724833m_-2840144985340265864gmail-m_-7838184213999349599m_4000392590411266320m_2161214087162447199gmail-m_3832867300096285m_2244722407452800141gmail-m_7133715810110630205HOEnZb"><font color="#888888"><div><br></div><div>-- Sean Silva</div></font></span><div><div class="m_8536049278293863233m_-805440991100361147m_6788176004464083424gmail-m_2484608349038262809m_-378079782347931555m_4730558371586502940m_-6648208884759771566gmail-m_-1917046612426427630m_8070089485785554164gmail-m_-8900455223397322873m_-1163800682136724833m_-2840144985340265864gmail-m_-7838184213999349599m_4000392590411266320m_2161214087162447199gmail-m_3832867300096285m_2244722407452800141gmail-m_7133715810110630205h5"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>The issue of retaining the ODR check for functions even if they get inlined may inherently pose an extra cost that can't be folded into existing work the linker is doing, so there might be a reason for clang to have a default mode that has practically no linking overhead and one that does more thorough checking but imposes extra linking overhead. Think something like a crazy boost library with thousands of functions that get inlined away, but have gigantic mangled names and so precisely are the ones that are going to impose extra cost on the linker. Simply due to the extra volume of strings that the linker would need to look at, I don't think there's a way to include checking of all inlined function "for free" at the linker level using the symbol approach. </div></div></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div> </div></div></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>I guess those inlined functions would still have those symbol names in debug info (I think?), so piggybacking on the string deduplication we're already doing might make it possible to fold away the work in that case (but then again, would still impose extra cost with split dwarf...).</div><div><br></div><div>Anyway, let's wait to see what the actual performance numbers are.</div><div><br></div><div>-- Sean Silva</div></div><div class="gmail_extra"><br><div class="gmail_quote"><div><div class="m_8536049278293863233m_-805440991100361147m_6788176004464083424gmail-m_2484608349038262809m_-378079782347931555m_4730558371586502940m_-6648208884759771566gmail-m_-1917046612426427630m_8070089485785554164gmail-m_-8900455223397322873m_-1163800682136724833m_-2840144985340265864gmail-m_-7838184213999349599m_4000392590411266320m_2161214087162447199gmail-m_3832867300096285m_2244722407452800141gmail-m_7133715810110630205m_4078671591976872278m_882224115606455620gmail-m_-3409186166243165562m_3249621725715427267m_8795855697522658696m_-8012223090726560720h5">On Tue, Jun 6, 2017 at 10:40 PM, Peter Collingbourne via cfe-dev <span dir="ltr"><<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>></span> wrote:<br></div></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div class="m_8536049278293863233m_-805440991100361147m_6788176004464083424gmail-m_2484608349038262809m_-378079782347931555m_4730558371586502940m_-6648208884759771566gmail-m_-1917046612426427630m_8070089485785554164gmail-m_-8900455223397322873m_-1163800682136724833m_-2840144985340265864gmail-m_-7838184213999349599m_4000392590411266320m_2161214087162447199gmail-m_3832867300096285m_2244722407452800141gmail-m_7133715810110630205m_4078671591976872278m_882224115606455620gmail-m_-3409186166243165562m_3249621725715427267m_8795855697522658696m_-8012223090726560720h5"><div dir="ltr">Hi all,<div><br></div><div>I'd like to propose an ODR checker feature for Clang and LLD. The feature would be similar to gold's --detect-odr-violations feature, but better: we can rely on integration with clang to avoid relying on debug info and to perform more precise matching.</div><div><br></div><div>The basic idea is that we use clang's ability to create ODR hashes for declarations. ODR hashes are computed using all information about a declaration that is ODR-relevant. If the flag -fdetect-odr-violations is passed, Clang will store the ODR hashes in a so-called ODR table in each object file. Each ODR table will contain a mapping from mangled declaration names to ODR hashes. At link time, the linker will read the ODR table and report any mismatches.<br></div><div><div><br></div><div><div><div>To make this work:</div><div>- LLVM will be extended with the ability to represent ODR tables in the IR and emit them to object files</div><div>- Clang will be extended with the ability to emit ODR tables using ODR hashes</div></div><div>- LLD will be extended to read ODR tables from object files</div></div><div><br></div><div><div>I have implemented a prototype of this feature. It is available here: <a href="https://github.com/pcc/llvm-project/tree/odr-checker" target="_blank">https://github.com/pcc/<wbr>llvm-project/tree/odr-checker</a> <wbr>and some results from applying it to chromium are here: <a href="http://crbug.com/726071" target="_blank">crbug.com/726071</a></div><div>As you can see it did indeed find a number of real ODR violations in Chromium, including some that wouldn't be detectable using debug info.</div><div><br></div><div>If you're interested in what the format of the ODR table would look like, that prototype shows pretty much what I had in mind, but I expect many other aspects of the implementation to change as it is upstreamed.</div><div><br></div></div><div>Thanks,</div><span class="m_8536049278293863233m_-805440991100361147m_6788176004464083424gmail-m_2484608349038262809m_-378079782347931555m_4730558371586502940m_-6648208884759771566gmail-m_-1917046612426427630m_8070089485785554164gmail-m_-8900455223397322873m_-1163800682136724833m_-2840144985340265864gmail-m_-7838184213999349599m_4000392590411266320m_2161214087162447199gmail-m_3832867300096285m_2244722407452800141gmail-m_7133715810110630205m_4078671591976872278m_882224115606455620gmail-m_-3409186166243165562m_3249621725715427267m_8795855697522658696m_-8012223090726560720m_-552047760254930432HOEnZb"><font color="#888888">-- <br><div class="m_8536049278293863233m_-805440991100361147m_6788176004464083424gmail-m_2484608349038262809m_-378079782347931555m_4730558371586502940m_-6648208884759771566gmail-m_-1917046612426427630m_8070089485785554164gmail-m_-8900455223397322873m_-1163800682136724833m_-2840144985340265864gmail-m_-7838184213999349599m_4000392590411266320m_2161214087162447199gmail-m_3832867300096285m_2244722407452800141gmail-m_7133715810110630205m_4078671591976872278m_882224115606455620gmail-m_-3409186166243165562m_3249621725715427267m_8795855697522658696m_-8012223090726560720m_-552047760254930432m_770034671429544214gmail-m_-7030506920794196484gmail_signature"><div dir="ltr">-- <div>Peter</div></div></div>

</font></span></div></div>

<br></div></div>______________________________<wbr>_________________<br>

cfe-dev mailing list<br>

<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a><br>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/cfe-dev</a><br>

<br></blockquote></div><br></div>

</blockquote></span></div><span class="m_8536049278293863233m_-805440991100361147m_6788176004464083424gmail-m_2484608349038262809m_-378079782347931555m_4730558371586502940m_-6648208884759771566gmail-m_-1917046612426427630m_8070089485785554164gmail-m_-8900455223397322873m_-1163800682136724833m_-2840144985340265864gmail-m_-7838184213999349599m_4000392590411266320m_2161214087162447199gmail-m_3832867300096285m_2244722407452800141gmail-m_7133715810110630205m_4078671591976872278HOEnZb"><font color="#888888"><br><br clear="all"><div><br></div>-- <br><div class="m_8536049278293863233m_-805440991100361147m_6788176004464083424gmail-m_2484608349038262809m_-378079782347931555m_4730558371586502940m_-6648208884759771566gmail-m_-1917046612426427630m_8070089485785554164gmail-m_-8900455223397322873m_-1163800682136724833m_-2840144985340265864gmail-m_-7838184213999349599m_4000392590411266320m_2161214087162447199gmail-m_3832867300096285m_2244722407452800141gmail-m_7133715810110630205m_4078671591976872278m_882224115606455620gmail-m_-3409186166243165562m_3249621725715427267m_8795855697522658696m_-8012223090726560720gmail_signature"><div dir="ltr">-- <div>Peter</div></div></div>

</font></span></div></div>

</blockquote></div></div></div><br></div></div>

</blockquote></div></div></div><span class="m_8536049278293863233m_-805440991100361147m_6788176004464083424gmail-m_2484608349038262809m_-378079782347931555m_4730558371586502940m_-6648208884759771566gmail-m_-1917046612426427630m_8070089485785554164gmail-m_-8900455223397322873m_-1163800682136724833m_-2840144985340265864gmail-m_-7838184213999349599m_4000392590411266320m_2161214087162447199gmail-m_3832867300096285HOEnZb"><font color="#888888"><br><br clear="all"><div><br></div>-- <br><div class="m_8536049278293863233m_-805440991100361147m_6788176004464083424gmail-m_2484608349038262809m_-378079782347931555m_4730558371586502940m_-6648208884759771566gmail-m_-1917046612426427630m_8070089485785554164gmail-m_-8900455223397322873m_-1163800682136724833m_-2840144985340265864gmail-m_-7838184213999349599m_4000392590411266320m_2161214087162447199gmail-m_3832867300096285m_2244722407452800141gmail-m_7133715810110630205gmail_signature"><div dir="ltr">-- <div>Peter</div></div></div>

</font></span></div></div>

</blockquote></div></div></div><div dir="ltr"><div class="gmail_extra"><br><br clear="all"><div><br></div>-- <br><div class="m_8536049278293863233m_-805440991100361147m_6788176004464083424gmail-m_2484608349038262809m_-378079782347931555m_4730558371586502940m_-6648208884759771566gmail-m_-1917046612426427630m_8070089485785554164gmail-m_-8900455223397322873m_-1163800682136724833m_-2840144985340265864gmail-m_-7838184213999349599m_4000392590411266320m_2161214087162447199gmail-m_3832867300096285gmail_signature"><div dir="ltr">-- <div>Peter</div></div></div>

</div></div>

______________________________<wbr>_________________<br>

cfe-dev mailing list<br>

<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a><br>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/cfe-dev</a><br>

</blockquote></div></div></div></div>

</blockquote></div></div></div><div dir="ltr"><div class="gmail_extra"><br><br clear="all"><div><br></div>-- <br><div class="m_8536049278293863233m_-805440991100361147m_6788176004464083424gmail-m_2484608349038262809m_-378079782347931555m_4730558371586502940m_-6648208884759771566gmail-m_-1917046612426427630m_8070089485785554164gmail_signature"><div dir="ltr">-- <div>Peter</div></div></div>

</div></div></blockquote></div></div></div></div>

</blockquote></div></div></div><div dir="ltr"><div class="gmail_extra"><br><br clear="all"><div><br></div>-- <br><div class="m_8536049278293863233m_-805440991100361147m_6788176004464083424gmail-m_2484608349038262809m_-378079782347931555m_4730558371586502940m_-6648208884759771566gmail_signature"><div dir="ltr">-- <div>Peter</div></div></div>

</div></div></blockquote></div></div></div></div>

</blockquote></div></div></div><div dir="ltr"><div class="gmail_extra"><br><br clear="all"><div><br></div>-- <br><div class="m_8536049278293863233m_-805440991100361147m_6788176004464083424gmail-m_2484608349038262809m_-378079782347931555gmail_signature"><div dir="ltr">-- <div>Peter</div></div></div>

</div></div></blockquote></div>

</div></div></blockquote></div></div></div><span class="m_8536049278293863233m_-805440991100361147HOEnZb"><font color="#888888"><br><br clear="all"><div><br></div>-- <br><div class="m_8536049278293863233m_-805440991100361147m_6788176004464083424gmail_signature"><div dir="ltr">-- <div>Peter</div></div></div>

</font></span></div></div>

<br>______________________________<wbr>_________________<br>

cfe-dev mailing list<br>

<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a><br>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/cfe-dev</a><br>

<br></blockquote></div><br></div>

</blockquote></div></div></div></div>

</blockquote></div><br></div>