[LLVMdev] [RFC] Simple control-flow integrity

Thu Apr 3 18:28:57 PDT 2014

On Wed, Apr 02, 2014 at 05:28:04PM -0700, Tom Roeder wrote:
> On Fri, Mar 21, 2014 at 1:46 PM, Peter Collingbourne <peter at pcc.me.uk>wrote:
> 
> > On Fri, Mar 21, 2014 at 12:54:07PM -0700, Tom Roeder wrote:
> > > On Fri, Mar 21, 2014 at 12:15 PM, Peter Collingbourne <peter at pcc.me.uk>
> > wrote:
> > > >> The way I've implemented it (see the patch I sent to llvm-commits
> > > >> yesterday), it's not just metadata: the intrinsic lowers to the
> > > >> jumptable entry code given above. The CFI pass then generates a
> > > >> function for each jump table; the function consists solely of these
> > > >> intrinsic calls.
> > > >
> > > > Well, the intrinsic you proposed has no effect on the caller and has
> > > > non-local effects on other specified functions. I'm not aware of any
> > other
> > > > intrinsic with similar behavior.
> > >
> > > I agree that it's not very similar to other intrinsics. But I don't
> > > exactly follow these statements. There are definitely intrinsics that
> > > have no effect on the caller, like llvm.var.annotation.
> >
> > Yes but the purpose of such intrinsics is to communicate information about
> > a specific value that may have an effect on analysis, optimization or code
> > generation for that caller. On the other hand, the intrinsic you are
> > proposing
> > has nothing to do with the caller.
> >
> > > And AFAIK,
> > > there is no non-local behavior: all the intrinsic does is lower to the
> > > labeled jump instruction; the changes to address-taken functions are
> > > done separately by the CFI pass. Note that in the patch I sent, the
> > > intrinsic only takes one argument: the function to jump to. Are there
> > > other effects in this case?
> >
> > The non-local effect is that the intrinsic describes the definition of a
> > function in the global scope. Normally such definitions come from top-level
> > entities.
> >
> > > So, maybe it would be better to call it something like
> > > @llvm.unconditional.jump(i8*)? I could then make it only lower to the
> > > jump and add an intrinsic that lowered to a function label as well.
> >
> > I'd imagine that might present more problems. For example, if you used
> > either intrinsic in the middle of a regular function the behavior would not
> > necessarily be well defined. It would be necessary to carefully document
> > where and how these intrinsics may be used.
> >
> 
> Taking these considerations into account, I propose this:
> 
> Jump functions created for the jump table are actual functions, marked with
> a new jumptable attribute, as well as naked and noreturn and optnone. Each
> table is put in a special section using the section attribute, and
> alignment of these functions is done using the align attribute. The
> combination of these attributes means that the function has no preamble or
> postamble, so it consists of exactly a global function label and its
> instructions.
> 
> Then I think it would make sense to create an intrinsic like
> llvm.jumptable.instr(i8*) that would be a placeholder for an unconditional
> jump. I can add code to the verifier that insists on two conditions:
> 
> 1. functions marked as jumptable must have exactly two instructions: an
> llvm.jumptable.instr followed by a unreachable.
> 2. llvm.jumptable.instr can only occur in a jumptable function.
> 
> I think this handles the problem of using llvm.jumptable.instr in normal
> functions: that isn't allowed. And I think it deals with the problems of
> non-local behavior by making functions out of the things that really are
> functions. Each entry in the jump table really is a function, albeit a
> rather strange one, so it should look like a function at the IR level.
> 
> In other words, this change would add a new attribute that marks a very
> specialized kind of function and an intrinsic that can only occur in this
> kind of function.
> 
> What do you think?

I think you might be close.

The llvm.jumptable.instr intrinsic you have proposed is similar to the
'musttail' call marker that Reid (cc'd) is working on in [1]. That marker will
also have verifier support. I think it might be reasonable to require the
code generator to emit the body of a function containing only a 'musttail'
function call as a single branch instruction. Then you could emit such
functions in the jump table section.

Reid, what do you think?

Thanks,
-- 
Peter

[1] http://llvm-reviews.chandlerc.com/D3240