[LLVMdev] [cfe-dev] design for an accurate ODR-checker with clang

Mon Aug 5 17:23:56 PDT 2013

On 5 August 2013 16:55, John McCall <rjmccall at apple.com> wrote:

> On Aug 5, 2013, at 4:43 PM, Nick Lewycky <nlewycky at google.com> wrote:
>
>  On 5 August 2013 15:33, John McCall <rjmccall at apple.com> wrote:
>
>> On Aug 5, 2013, at 3:04 PM, Nick Lewycky <nlewycky at google.com> wrote:
>>
>> On 15 July 2013 15:12, John McCall <rjmccall at apple.com> wrote:
>>
>>> The same sorts of things that you were planning on hashing, but maybe
>>> not hashed.  It's up to you; having a full string would let you actually
>>> show a useful error message, but it definitely inflates binary sizes.  If
>>> you really think you can make this performant enough to do on every load, I
>>> can see how the latter would be important.
>>>
>>
>> I was thinking we could add more things to help diagnostics, next to the
>> hash. I *think* there are two cases that matter, but there may be more.
>> Either we have an ODR violation where the file:line are different, or if
>> file and line are the same then the preprocessor state was different.  We
>> could emit file and line from the starting loc of each of the hashes, and
>> we could emit a preprocessor table with the list of initial defines and
>> changes to those defines as the TU went along -- at each hash we could
>> point to an index into that table to indicate where we are. Both of those
>> give us enough information for the linker to say why the ODRs failed to
>> match.
>>
>>
>> Do you have any idea how much data that is?  Your .o files are going to
>> be *huge*.
>>
>
> The point of the design is to minimize the number of entries, so adding
> more data per-entry doesn't seem like a big problem.
>
>
> No, I mean the “preprocessor table with the list of initial defines and
> changes to those defines as the TU went along”.  That’s global, sure, but
> it’s enormous.
>

Ah. Even more convincing to me is that this data is not normally included
in .o files at all. You're right. This could get big.

>    This isn't going to be performant enough to do unconditionally at
>>>> every load no matter how much you shrink it.
>>>>
>>>
>>> Every load of a shared object? That's not a fast operation even without
>>> odr checking, but the idea is to keep the total number of entries in the
>>> odr table small. It's less than the number of symbols, closer to the number
>>> of top-level decls.
>>>
>>>
>>> Your ABI dependencies are every declaration *that you ever rely on*.
>>>  You've got to figure that that's going to be very large.  For a library of
>>> any significance, I'd be expecting this check to touch about half a
>>> megabyte of data, even with a 32-bit hash and some sort of clever prefixing
>>> scheme on the symbols.  That's a pretty major regression in library loading.
>>>
>>
>> Fair point. If we want to include less in the hash just to make it more
>> palatable for dynamic library users, we can have a flag for that. Some sort
>> of ODR-checking lite.
>>
>> I really don't care about .so files myself.
>>
>>
>> Maybe you should just spec this out as a static-linker-only technology,
>> then.
>>
>
> That's fair criticism. I did exactly that, then was asked to please
> consider the dynamic linker case and lazily said "oh sure, I don't see any
> reason the same scheme can't verify at load time too". I'll retract any
> claims about handling the dynamic loading case.
>
>
> The same scheme could work, but doing it at load time forces completely
> different performance considerations, so it really demands a different
> design.
>

I wonder whether the people who ask for ODR-violation detection at dynamic
link time are looking for something different? A way to verify that a new
release of their C++ library is free from a certain class of ABI mismatch
against the previous version. This is something entirely different from
what my goal was, even though it's also under the banner of enforcing C++'s
ODR rule.

>   Also, you should have something analogous to symbol visibility as a way
>>>> to tell the static linker that something only needs to be ODR-checked
>>>> within a linkage unit.  It would be informed by actual symbol visibility,
>>>> of course.
>>>>
>>>
>>> Great point, and that needs to flow into the .o files as well. If a
>>> class has one visibility and its method has another, we want to skip the
>>> method when hashing the class, and need to emit an additional entry for the
>>> method alone? Is that right?
>>>
>>>
>>> Class hashes should probably only include virtual methods anyway, but
>>> yes, I think this is a good starting point.
>>>
>>> What do you want in the hash for a function anyway?  Almost everything
>>> is already captured by (1) the separate hashes for the nominal types
>>> mentioned and (2) the symbol mangling.  You're pretty much only missing the
>>> return type.  Oh, I guess you need the body's dependencies for inline
>>> functions.
>>>
>>
>> On the contrary, this is the case I care about! Two different definitions
>> of the same function.
>>
>>
>> Allow me to suggest a structural hash of the statements, then, instead of
>> dumping the entire preprocessor history.
>>
>
> But I am suggesting a structural hash of the statements!
>
>
> With pointers into a complete preprocessor history for some reason!
>
> By “structural”, I mean “considering only the result of preprocessing and
> semantic analysis”.
>

Yes. There's two things here being conflated. The ODR detection itself --
determining whether there is an ODR violation -- should be done considering
only the result of preprocessing and semantic analysis. Absolutely. Violent
agreement.

Once an ODR violation is found, how can the linker produce a better error
message than "nope!"? That's where my suggestion of tagging each entry with
the macro changes since previous top-level-decl came from. That should be
small in most cases (who defines things in the middle of their TU?) but in
reality we can't discard macros that come in from the system headers (and
are consequently used to declare functions that are used by definitions
later on). Ouch.

If I were to suppose a flag we could provide that means "help, I
encountered an ODR violation and need the compiler's help to solve it" then
there's no reason we couldn't just have that flag include snippets of
source code in the .o instead of a condensed list of preprocessor changes.
Though upon further consideration, maybe that should just be a clang-tool.

Nick
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130805/1f7ff376/attachment.html>