[llvm-commits] change objectsize signature

John Criswell criswell at illinois.edu
Wed May 9 08:55:00 PDT 2012


On 5/9/12 10:28 AM, Nuno Lopes wrote:
> [snip]
>
>
>>>> o) Your check does not guarantee that %a and %ptr belong to the 
>>>> same memory object.  All you're guaranteeing is that %ptr points 
>>>> into a valid memory object with 4 bytes remaining in the memory 
>>>> object.  It's possible that %ptr overflowed the object pointed to 
>>>> by %a and is now within the bounds of another memory object (stack 
>>>> objects often have no padding in between them).  That's fine, but 
>>>> then it's not an array indexing check; it's a load/store check.
>>>
>>> Well, it's a GEP from %a, so it must point somewhere inside the 
>>> buffer of %a. objectsize should not allow buffer overflows.
>>
>> If objectsize is implemented as I think it is, then no, your design 
>> does not catch buffer overflows that move a pointer from one memory 
>> object into another.
>
> No. The objectsize intrinsic is lowered at compile time. This means it 
> can "see" the GEPs. The current implementation already takes cares of 
> correctly computing the offset from the beginning within an allocated 
> object. It's not just taking an arbitrary pointer at run-time.

I see.  So the objectsize instruction is tracing back up the def-use 
chain to find the correct object.

If that's the case, then your design will catch buffer overflows on 
objects allocated within the current function.  However, if you later 
want to expand the functionality to catch buffer overflows on 
non-locally allocated objects, then you'll need to expand the interface 
to objectsize so that it takes the source operand of the GEP as well as 
the resulting pointer that you want to check (at least if you want to 
implement object lookup approaches like Jones/Kelley, Ruwase/Lam, 
SAFECode, Baggy Bounds Checking, etc).

>
>
>> Consider the following:
>>
>> union foo {
>>     char * p;
>>     uintptr_t q;
>> } bar;
>>
>> int func (union foo * f) {
>>     return (f.p[5] = ...);
>> }
>>
>> int main () {
>>     union foo a;
>>     a.q = 5;
>>     func (&a)
>> }
>>
>> In your design, a check on f.p[5] in func() must always pass because 
>> you don't know whether the value in f.p can originate from external 
>> code.  However, it's clear in this program that the check could just 
>> fail because we know, just by looking at it, that f.p is only set by 
>> code within the program.
>>
>> Data flow analysis could be used to determine which checks only check 
>> internally allocated pointers and which checks can check externally 
>> allocated pointers; the checks could then be modified to contain this 
>> information.  The gepcheck code then becomes:
>>
>> Your design does not specify such a feature, so it can't be used to 
>> catch these sorts of errors in its current form.
>
> The current implementation is intra-procedural, so it won't catch 
> these errors. But there's nothing preventing you from implementing 
> such checks.  If it's an internal function, the argument list can be 
> augmented with an additional parameter to pass the size of f.p.  And a 
> data flow analysis can then be used to discard come checks, as you said.

I think what you're saying is that you don't need relaxed versions of 
checks if you use a fat pointer approach (i.e., you pass bounds 
information along with the pointer).  That may be correct, but fat 
pointers have a number of compatibility issues and only handle certain 
classes of memory safety errors.  Object lookup approaches (which use a 
side data structure to lookup bounds information on memory objects) 
offer a number of compatibility (and in some cases, performance) 
benefits.  If you want a bounds checking instruction design that can 
grow as you do more checks, you don't want a design that ties you to 
using just one approach; you want one that can support both.

-- John T.




More information about the llvm-commits mailing list