[LLVMdev] design for an accurate ODR-checker with clang

Thu Jul 11 17:45:58 PDT 2013

Hi! A few of us over at Google think a nice feature in clang would be ODR
violation checking, and we thought for a while about how to do this and
wrote it down, but we aren't actively working on it at the moment nor plan
to in the near future. I'm posting this to share our design and hopefully
save anyone else the design work if they're interested in it.

For some background, C++'s ODR rule roughly means that two definitions of
the same symbol must come from "the same tokens with the same
interpretation". Given the same token stream, the interpretation can be
different due to different name lookup results, or different types through
typedefs or using declarations, or due to a different point of
instantiation in two translation units.

Unlike existing approaches (the ODR checker in the gold linker for
example), clang lets us do this with no false positives and very few false
negatives. The basis of the idea is that we produce a hash of all the
ODR-relevant pieces, and to try to pick the largest possible granularity.
By granularity I mean that we would hash the entire definition of a class
including all methods defined lexically inline and emit a single value for
that class.

The first step is to build a new visitor over the clang AST that calculates
a hash of the ODR-relevant pieces of the code. (StmtProfiler doesn’t work
here because it includes pointers addresses which will be different across
different translation units.) Hash the outermost declaration with
external-linkage. For example, given a class with a method defined inline,
we start the visitor at the class, not at the method. The entirety of the
class must be ODR-equivalent across two translation units, including any
inline methods.

Although the standard mentions that the tokens must be the same, we do not
actually include the tokens in the hash. The structure of the AST includes
everything about the code which is semantically relevant. Any false
positives that would be fixed by hashing the tokens either do not impact
the behaviour of the program or could be fixed by hashing more of the
AST. References to globals should be hashed by name, but references to
locals should be hashed by an ordinal number.

Instantiated templates are also visited by the hashing visitor. If we did
not, we would have false negatives where the code is not conforming due to
different points of instantiation in two translation units. We can skip
uninstantiated templates since they don’t affect the behaviour of the
program, and we need to visit the instantiations regardless.

In LLVM IR, create a new named metadata node !llvm.odr_checking which
contains a list of <mangled name, hash value> pairs. The names do not
necessarily correspond to symbols, for instance, a class will have a hash
value but does not have a corresponding symbol. For ease of implementation,
names should be mangled per the C++ Itanium ABI (demanglable with c++filt
-t). Merging modules that contain these will need to do ODR checking as
part of that link, and the resulting module will have the union of these
tables.

In the .o file, emit a sorted table of <mangled name, hash value> in a
non-loadable section intended to be read by the linker. All entries in the
table must be checked if any symbol from this .o file is involved in the
link (note that there is no mapping from symbol to odr table name). If two
.o files contain different hash values for the same name, we have detected
an ODR violation and issue a diagnostic.

Finally, teach the loader (RuntimeDyld) to do verification and catch ODR
violations when dlopen'ing a shared library.

Nick
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130711/cde5bf59/attachment.html>