[LLVMdev] [RFC] Simple control-flow integrity

Eric Christopher echristo at gmail.com
Sat Feb 15 19:08:55 PST 2014


On Tue, Feb 11, 2014 at 9:46 AM, Tom Roeder <tmroeder at google.com> wrote:
> On Tue, Feb 11, 2014 at 9:12 AM, Reid Kleckner <rnk at google.com> wrote:
>> On Tue, Feb 11, 2014 at 12:28 AM, Richard Osborne <richard at xmos.com> wrote:
>>>
>>>
>>> On 11 Feb 2014, at 08:15, Eric Christopher <echristo at gmail.com> wrote:
>>>
>>> > On Mon, Feb 10, 2014 at 11:51 PM, Reid Kleckner <rnk at google.com> wrote:
>>> >>
>>> >>
>>> >>
>>> >> IIRC this came up before, and I don't think we expose anything like a
>>> >> jump
>>> >> table at the IR level.  As an IR-to-IR transform, I think asm is the
>>> >> only
>>> >> way to do it.
>>> >
>>> > I'd have to look more at what he's doing, but wouldn't a simple switch
>>> > statement in IR suffice? Efficiency would be up to the various
>>> > lowering mechanisms, but it wouldn't require inline asm.
>
> I'm not sure I follow how this would work. Could you expand on this, please?
>

I think you've already rebutted it below, so how about another idea? :)

What about creating a pseudo-plt in the back end that will create this
jump table for you at object emission time?

Throwing out ideas in an attempt to avoid passes creating inline
assembly - especially since we're looking at an IR level pass.

-eric

>
>>> >
>>> > -eric
>>> Another option might be to create an array of function pointers in the
>>> LLVM IR, i.e generate code that looks like:
>>>
>>> void (*jumptable[])() = {
>>>   &a,
>>>   &b
>>> };
>>>
>>> void f(int index) {
>>>   *(jumptable[index])();
>>> }
>>
>>
>> This isn't ABI compatible.  Now function pointers point to data (or are
>> switch table indices) and not code.
>>
> My first attempt at a solution did it just like this, with data
> instead of pointers, but that opens several large cans of worms. As
> mentioned, the ABI is a major problem, and to maintain compatibility,
> you have to provide a complete escape analysis for all address-taken
> functions, since they have to be transformed before they are passed to
> external code. And, e.g., the IR translation of vtable calls actually
> use a more complicated representation of function pointers, IIRC.
>
> My version based on data instead of pointers worked well for simple
> code and toy examples but it became extremely complicated and fragile
> once I tried to apply it to real code. I finally realized that any
> transformation that depends for its soundness on a complete solution
> to an alias analysis problem is a non-starter for large, complicated
> programs; conservative solutions weaken the protections provided by
> the transformation.
>
> This is why I switched to using inline asm; I add a declaration of a
> fake external function for each entry in the table and use it as the
> function pointer. Since my inline asm defines these symbols, the
> linker is satisfied. There doesn't seem to be a way to create a true
> jump table in LLVM, so this is the best we can do.
>
>
>> I can imagine abusing indirectbr across separate functions, but rolling your
>> own jump table in asm seems better.
>
> I tried using indirect branches for a bit, but I didn't see how to
> make that work. Aren't they restricted to branching to labels inside
> the current function? From the LLVM Language Reference entry on the
> semantics of indirectbr:
>
>
> Control transfers to the block specified in the address argument. All
> possible destination blocks must be listed in the label list,
> otherwise this instruction has undefined behavior. This implies that
> jumps to labels defined in other functions have undefined behavior as
> well.
>
>
>>
>> On Tue, Feb 11, 2014 at 5:07 AM, Joerg Sonnenberger
>> <joerg at britannica.bec.de> wrote:
>>>
>>> On Mon, Feb 10, 2014 at 03:33:32PM -0800, Tom Roeder wrote:
>>> >     3. adds a fast check for pointer safety at each indirect call site:
>>>
>>> Why not using a bloom filter for valid target addresses instead?
>>
>>
>> Can a bloom filter be as fast as a simple bounds check?  I'm thinking lea
>> base, sub, cmp, jl, and cold call.
>
> Exactly. The jump table turns into a very small amount of code; note
> that a normal bounds check has to check both bounds (so two subs and
> cmps).  With the base and mask, and in an asm pseudo-code, it does:
>
> sub base, addr
> and mask, addr
> add base, addr
> cmp addr, orig
> je normal_call
> <load info for warning call>
> call warning
> normal_call:
> call orig
>
>
> And if you can get sufficient power-of-two alignment for the table,
> you can do even better, since then the base is a prefix of all valid
> addrs in its table. Unfortunately, Linux only gives you alignment up
> to 2^12 under PIE/ASLR, even though the ELF binary claims to have
> larger alignment.
>
> and mask addr
> add base addr
> cmp addr, orig
> <etc...same>
>
> My code supports this as an option, but the default is the subtraction version.
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



More information about the llvm-dev mailing list