[PATCH] Add a jumptable attribute and support for creating jump-instruction tables

Tom Roeder tmroeder at google.com
Thu May 22 10:04:56 PDT 2014


On Wed, May 21, 2014 at 7:57 PM, Nick Lewycky <nicholas at mxc.ca> wrote:
> Rafael EspĂ­ndola wrote:
>>>
>>> That sounds fine to me, since as you note, I can add passes either in
>>> LLVM or clang to add unnamed_addr to all or a subset of functions
>>> first.
>>>
>>> Even better for me would if jumptable implies unnamed_addr: as part of
>>> the transformation, the jumptable pass will add unnamed_addr to every
>>> function marked with jumptable. Will that work?
>>>
>>> Then there's no need to add unnamed_addr with a clang option or a
>>> special pass, since anything marked as jumptable will get
>>> unnamed_addr.
>>
>>
>> It would work. I am not sure how I feel about having an attribute that
>> is a superset of another. Nick, what do you think?
>
>
> We already do that a bit. sret implies noalias and nonnull, readnone implies
> readonly, etc. I'd rather the we kept the two attributes separate in the IR
> and set them independently (for example, refuse to mark 'jumptable' on a
> function that isn't 'unnamed_addr' if necessary to avoid miscompiles).
>
> I'm a bit surprised that the resolution appears to be that the jumptable
> transform requires an ABI break (ie., that we can't guarantee that &func1 ==
> &func2 is preserved across all combinations of mixing .o files with and
> without jumptable transform applies). Is that correct, thinking from a
> generated .o-files point of view.

After thinking about it a bit, this property seems to be fundamental
to the way I'm doing jump tables and control-flow integrity, even
ignoring details of the compiler or IR. The purpose of a jump-table
entry in this scheme is to have a new function pointer that has a
different address than the original and to split the usage of the
function so that all function pointers are the new function pointer
and all direct calls use the original address. Then you can force all
these new function pointers to be in some convenient block, which
enables the control-flow integrity transformation.

Given that unnamed_addr already allows functions to break
function-pointer equality, I don't mind keeping the two attributes
separate and only allowing jumptable on functions that already have
unnamed_addr.

Another way to deal with the ABI problem would be to replace *all*
uses (including direct calls) of a function with a jumptable function,
and rename the original function (like the mergefunc transformation in
lib/Transforms/IPO/MergeFunctions). Then function-pointer equality is
preserved everywhere, but this is much less efficient, since direct
calls go unnecessarily through the jump table. It would be possible to
change the jumptable pass so it does the efficient transformation
(skipping direct calls) only if a function already has unnamed_addr,
since then it's already allowed to break function-pointer equality.
Any function that is marked jumptable and doesn't have unnamed_addr
would be renamed and redirected through the jumptable everywhere. For
now I'd rather stick with the simpler version, since that's sufficient
for my needs and is still correct.

> (Relatedly, I hear there are similar
> patches in the works for GCC. Will we be ABI-compatible or incompatible with
> the GCC implementation?)

The GCC implementation is called VTV (VTable Verification) and is
already in the recently released version 4.9. VTV works somewhat
differently and has somewhat different goals: it protects vtable
entries rather than all indirect calls. It also checks a finer-grained
property than my CFI implementation: it makes sure that a vtable call
is being made to one of the vtable pointers that is valid for the
class used at the call site. VTV doesn't change any of the function
pointers, so it doesn't change the function-pointer equality property,
and it doesn't add any new compatibility issues. However, compiling a
library with VTV and linking it into something not compiled with VTV
is not a supported use of VTV, AFAIK, and it would almost certainly
cause false positives.




More information about the llvm-commits mailing list