<div dir="ltr">On 5 August 2013 16:55, John McCall <span dir="ltr"><<a href="mailto:rjmccall@apple.com" target="_blank">rjmccall@apple.com</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div style="word-wrap:break-word"><div>On Aug 5, 2013, at 4:43 PM, Nick Lewycky <<a href="mailto:nlewycky@google.com" target="_blank">nlewycky@google.com</a>> wrote:<br></div><div><blockquote type="cite">

<div style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">


<div dir="ltr"><div>On 5 August 2013 15:33, John McCall<span> </span><span dir="ltr"><<a href="mailto:rjmccall@apple.com" target="_blank">rjmccall@apple.com</a>></span><span> </span>wrote:<br></div><div class="gmail_extra">


<div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word">


<div><div>On Aug 5, 2013, at 3:04 PM, Nick Lewycky <<a href="mailto:nlewycky@google.com" target="_blank">nlewycky@google.com</a>> wrote:<br></div></div><div><div><blockquote type="cite"><div style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">


<div dir="ltr"><div>On 15 July 2013 15:12, John McCall<span> </span><span dir="ltr"><<a href="mailto:rjmccall@apple.com" target="_blank">rjmccall@apple.com</a>></span><span> </span>wrote:</div><div class="gmail_extra">


<div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="word-wrap:break-word">


<div>The same sorts of things that you were planning on hashing, but maybe not hashed.  It's up to you; having a full string would let you actually show a useful error message, but it definitely inflates binary sizes.  If you really think you can make this performant enough to do on every load, I can see how the latter would be important.</div>


</div></blockquote><div><div><br></div><div>I was thinking we could add more things to help diagnostics, next to the hash. I *think* there are two cases that matter, but there may be more. Either we have an ODR violation where the file:line are different, or if file and line are the same then the preprocessor state was different.  We could emit file and line from the starting loc of each of the hashes, and we could emit a preprocessor table with the list of initial defines and changes to those defines as the TU went along -- at each hash we could point to an index into that table to indicate where we are. Both of those give us enough information for the linker to say why the ODRs failed to match.</div>


</div></div></div></div></div></blockquote><div><br></div></div><div>Do you have any idea how much data that is?  Your .o files are going to be *huge*.</div></div></div></blockquote><div><div><br></div>

<div>The point of the design is to minimize the number of entries, so adding more data per-entry doesn't seem like a big problem.</div></div></div></div></div></div></blockquote><div><br></div>No, I mean the “preprocessor table with the list of initial defines and changes to those defines as the TU went along”.  That’s global, sure, but it’s enormous.</div>


</div></blockquote><div><br></div><div>Ah. Even more convincing to me is that this data is not normally included in .o files at all. You're right. This could get big.</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div style="word-wrap:break-word"><div><div><blockquote type="cite" dir="auto"><div style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">


<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">


<div style="word-wrap:break-word"><div><blockquote type="cite" dir="auto"><div style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">


<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">


<div style="word-wrap:break-word"><div><blockquote type="cite"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">


This isn't going to be performant enough to do unconditionally at every load no matter how much you shrink it.<br></blockquote><div><br></div><div>Every load of a shared object? That's not a fast operation even without odr checking, but the idea is to keep the total number of entries in the odr table small. It's less than the number of symbols, closer to the number of top-level decls.</div>


</div></div></div></blockquote><div><br></div></div>Your ABI dependencies are every declaration *that you ever rely on*.  You've got to figure that that's going to be very large.  For a library of any significance, I'd be expecting this check to touch about half a megabyte of data, even with a 32-bit hash and some sort of clever prefixing scheme on the symbols.  That's a pretty major regression in library loading.</div>


</blockquote><div><br></div><div>Fair point. If we want to include less in the hash just to make it more palatable for dynamic library users, we can have a flag for that. Some sort of ODR-checking lite.<br><br></div><div>


I really don't care about .so files myself.</div></div></div></div></div></blockquote><div><br></div></div><div>Maybe you should just spec this out as a static-linker-only technology, then.</div></div></blockquote><div>


<br></div><div>That's fair criticism. I did exactly that, then was asked to please consider the dynamic linker case and lazily said "oh sure, I don't see any reason the same scheme can't verify at load time too". I'll retract any claims about handling the dynamic loading case.</div>


</div></div></div></div></blockquote><div><br></div></div>The same scheme could work, but doing it at load time forces completely different performance considerations, so it really demands a different design.</div></div>


</blockquote>

<div><br></div><div>I wonder whether the people who ask for ODR-violation detection at dynamic link time are looking for something different? A way to verify that a new release of their C++ library is free from a certain class of ABI mismatch against the previous version. This is something entirely different from what my goal was, even though it's also under the banner of enforcing C++'s ODR rule.</div>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div><div><blockquote type="cite" dir="auto"><div style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">


<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">


<div style="word-wrap:break-word"><div><blockquote type="cite" dir="auto"><div style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">


<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">


<div style="word-wrap:break-word"><div><div><blockquote type="cite"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">


Also, you should have something analogous to symbol visibility as a way to tell the static linker that something only needs to be ODR-checked within a linkage unit.  It would be informed by actual symbol visibility, of course.<br>


</blockquote><div><br></div><div>Great point, and that needs to flow into the .o files as well. If a class has one visibility and its method has another, we want to skip the method when hashing the class, and need to emit an additional entry for the method alone? Is that right?</div>


</div></div></div></blockquote><div><br></div></div>Class hashes should probably only include virtual methods anyway, but yes, I think this is a good starting point.</div><div><br></div><div>What do you want in the hash for a function anyway?  Almost everything is already captured by (1) the separate hashes for the nominal types mentioned and (2) the symbol mangling.  You're pretty much only missing the return type.  Oh, I guess you need the body's dependencies for inline functions.</div>


</div></blockquote><div><br></div><div>On the contrary, this is the case I care about! Two different definitions of the same function.</div></div></div></div></div></blockquote><div><br></div></div>Allow me to suggest a structural hash of the statements, then, instead of dumping the entire preprocessor history.</div>


</blockquote><div><br></div><div>But I am suggesting a structural hash of the statements!</div></div></div></div></div></blockquote><div><br></div></div>With pointers into a complete preprocessor history for some reason!</div>


<div><br></div><div>By “structural”, I mean “considering only the result of preprocessing and semantic analysis”.</div></div></blockquote><div><br></div><div>Yes. There's two things here being conflated. The ODR detection itself -- determining whether there is an ODR violation -- should be done considering only the result of preprocessing and semantic analysis. Absolutely. Violent agreement.</div>


<div><br></div><div>Once an ODR violation is found, how can the linker produce a better error message than "nope!"? That's where my suggestion of tagging each entry with the macro changes since previous top-level-decl came from. That should be small in most cases (who defines things in the middle of their TU?) but in reality we can't discard macros that come in from the system headers (and are consequently used to declare functions that are used by definitions later on). Ouch.</div>


<div><br></div><div>If I were to suppose a flag we could provide that means "help, I encountered an ODR violation and need the compiler's help to solve it" then there's no reason we couldn't just have that flag include snippets of source code in the .o instead of a condensed list of preprocessor changes. Though upon further consideration, maybe that should just be a clang-tool.</div>


<div><br></div><div>Nick</div><div><br></div>

</div></div></div>