[llvm-commits] change objectsize signature

Wed May 9 08:28:51 PDT 2012

>>>> The planed usage is the following:
>>>>
>>>> a[i] = ..
>>>>
>>>> converts to (using some loose syntax, and assuming that a[i]  
>>>> writes 4 bytes):
>>>>
>>>> %ptr = GEP(%a, %i)
>>>> if (objectsize(%ptr) >= 4)
>>>> // OK
>>>> else
>>>> __builtin_trap()
>>>>
>>>> (this is exactly what the current implementation does;
>>>
>>> I'm confused here: what is generating this code?  Clang?  
>>> Dragonegg?  I assume we're talking about LLVM IR that was  
>>> generated for C code.  Is my assumption wrong?
>>
>> It can be C, yes. For example, clang can already emit this code.
>
> Does it do that by default, or do you need to specify additional  
> command-line arguments for that?

-fbounds-checking
(or -fbounds-checking=X, where X is the runtime penalty)

>>> o) Your check does not guarantee that %a and %ptr belong to the  
>>> same memory object.  All you're guaranteeing is that %ptr points  
>>> into a valid memory object with 4 bytes remaining in the memory  
>>> object.  It's possible that %ptr overflowed the object pointed to  
>>> by %a and is now within the bounds of another memory object (stack  
>>> objects often have no padding in between them).  That's fine, but  
>>> then it's not an array indexing check; it's a load/store check.
>>
>> Well, it's a GEP from %a, so it must point somewhere inside the  
>> buffer of %a. objectsize should not allow buffer overflows.
>
> If objectsize is implemented as I think it is, then no, your design  
> does not catch buffer overflows that move a pointer from one memory  
> object into another.

No. The objectsize intrinsic is lowered at compile time. This means it  
can "see" the GEPs. The current implementation already takes cares of  
correctly computing the offset from the beginning within an allocated  
object. It's not just taking an arbitrary pointer at run-time.

> Consider the following:
>
> union foo {
>     char * p;
>     uintptr_t q;
> } bar;
>
> int func (union foo * f) {
>     return (f.p[5] = ...);
> }
>
> int main () {
>     union foo a;
>     a.q = 5;
>     func (&a)
> }
>
> In your design, a check on f.p[5] in func() must always pass because  
> you don't know whether the value in f.p can originate from external  
> code.  However, it's clear in this program that the check could just  
> fail because we know, just by looking at it, that f.p is only set by  
> code within the program.
>
> Data flow analysis could be used to determine which checks only  
> check internally allocated pointers and which checks can check  
> externally allocated pointers; the checks could then be modified to  
> contain this information.  The gepcheck code then becomes:
>
> Your design does not specify such a feature, so it can't be used to  
> catch these sorts of errors in its current form.

The current implementation is intra-procedural, so it won't catch  
these errors. But there's nothing preventing you from implementing  
such checks.  If it's an internal function, the argument list can be  
augmented with an additional parameter to pass the size of f.p.  And a  
data flow analysis can then be used to discard come checks, as you said.

>>> What you are proposing is a memory safety tool for production use;  
>>> it will be fast because it won't try to do expensive checks.  The  
>>> security policy it enforces seems reasonable.  You can build it  
>>> the way you describe, but I'd rather see it built with a more  
>>> general set of run-time checks that multiple tools can use.
>>
>> Feel free to propose an alternative specification. Of course we're  
>> open to better designs.
>
> I'll be working on that this week.

Ok, thank you!

Nuno