[LLVMdev] [cfe-dev] design for an accurate ODR-checker with clang

Mon Aug 5 16:55:38 PDT 2013

On Aug 5, 2013, at 4:43 PM, Nick Lewycky <nlewycky at google.com> wrote:
> On 5 August 2013 15:33, John McCall <rjmccall at apple.com> wrote:
> On Aug 5, 2013, at 3:04 PM, Nick Lewycky <nlewycky at google.com> wrote:
>> On 15 July 2013 15:12, John McCall <rjmccall at apple.com> wrote:
>> The same sorts of things that you were planning on hashing, but maybe not hashed.  It's up to you; having a full string would let you actually show a useful error message, but it definitely inflates binary sizes.  If you really think you can make this performant enough to do on every load, I can see how the latter would be important.
>> 
>> I was thinking we could add more things to help diagnostics, next to the hash. I *think* there are two cases that matter, but there may be more. Either we have an ODR violation where the file:line are different, or if file and line are the same then the preprocessor state was different.  We could emit file and line from the starting loc of each of the hashes, and we could emit a preprocessor table with the list of initial defines and changes to those defines as the TU went along -- at each hash we could point to an index into that table to indicate where we are. Both of those give us enough information for the linker to say why the ODRs failed to match.
> 
> Do you have any idea how much data that is?  Your .o files are going to be *huge*.
> 
> The point of the design is to minimize the number of entries, so adding more data per-entry doesn't seem like a big problem.

No, I mean the “preprocessor table with the list of initial defines and changes to those defines as the TU went along”.  That’s global, sure, but it’s enormous.

>>> This isn't going to be performant enough to do unconditionally at every load no matter how much you shrink it.
>>> 
>>> Every load of a shared object? That's not a fast operation even without odr checking, but the idea is to keep the total number of entries in the odr table small. It's less than the number of symbols, closer to the number of top-level decls.
>> 
>> Your ABI dependencies are every declaration *that you ever rely on*.  You've got to figure that that's going to be very large.  For a library of any significance, I'd be expecting this check to touch about half a megabyte of data, even with a 32-bit hash and some sort of clever prefixing scheme on the symbols.  That's a pretty major regression in library loading.
>> 
>> Fair point. If we want to include less in the hash just to make it more palatable for dynamic library users, we can have a flag for that. Some sort of ODR-checking lite.
>> 
>> I really don't care about .so files myself.
> 
> Maybe you should just spec this out as a static-linker-only technology, then.
> 
> That's fair criticism. I did exactly that, then was asked to please consider the dynamic linker case and lazily said "oh sure, I don't see any reason the same scheme can't verify at load time too". I'll retract any claims about handling the dynamic loading case.

The same scheme could work, but doing it at load time forces completely different performance considerations, so it really demands a different design.

>>> Also, you should have something analogous to symbol visibility as a way to tell the static linker that something only needs to be ODR-checked within a linkage unit.  It would be informed by actual symbol visibility, of course.
>>> 
>>> Great point, and that needs to flow into the .o files as well. If a class has one visibility and its method has another, we want to skip the method when hashing the class, and need to emit an additional entry for the method alone? Is that right?
>> 
>> Class hashes should probably only include virtual methods anyway, but yes, I think this is a good starting point.
>> 
>> What do you want in the hash for a function anyway?  Almost everything is already captured by (1) the separate hashes for the nominal types mentioned and (2) the symbol mangling.  You're pretty much only missing the return type.  Oh, I guess you need the body's dependencies for inline functions.
>> 
>> On the contrary, this is the case I care about! Two different definitions of the same function.
> 
> Allow me to suggest a structural hash of the statements, then, instead of dumping the entire preprocessor history.
> 
> But I am suggesting a structural hash of the statements!

With pointers into a complete preprocessor history for some reason!

By “structural”, I mean “considering only the result of preprocessing and semantic analysis”.

John.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130805/72f1e50f/attachment.html>