[llvm-commits] change objectsize signature

Tue May 8 11:44:42 PDT 2012

On 5/8/12 10:46 AM, Nuno Lopes wrote:
> Hi,
>
> Thanks for your reply! It's really appreciated.
>
> Ok, so let me try to address your concerns:
>  - The idea is that objectsize takes arbitrary pointers and returns 
> the remaining size that can be read from that point on. This means 
> that objectsize accepts pointers returned by malloc/calloc/alloca/.., 
> GEPs, etc.

Right.  That's what I understood.

> The planed usage is the following:
>
> a[i] = ..
>
> converts to (using some loose syntax, and assuming that a[i] writes 4 
> bytes):
>
> %ptr = GEP(%a, %i)
> if (objectsize(%ptr) >= 4)
>   // OK
> else
>   __builtin_trap()
>
> (this is exactly what the current implementation does; 

I'm confused here: what is generating this code?  Clang? Dragonegg?  I 
assume we're talking about LLVM IR that was generated for C code.  Is my 
assumption wrong?

I think I'm missing some of the context from which your design came.

> I'm just proposing to extend objectsize to return non-constant values)

Okay.  I don't think your original proposal was very clear.  You want 
objectsize to return non-constant values so that it is more effective at 
implementing these run-time checks; objectsize will not be doing the 
check itself.  Do I understand correctly?

If so, it is not clear to me how objectsize is determining the object's 
size.  Do you propose to handle only cases in which the objectsize might 
be one of several values, like so:

%p = alloca
%q = alloca
...
%r = phi %p, %q
objectsize (%r)

or do you want to do something more general that would require a lookup 
in a side data structure or fat pointers:

%p = load %q
objectsize (%p)

If you're doing lookups, how do you add/remove memory objects in the 
lookup data structure?  If the addition/removal is not done at the LLVM 
IR level, then how can it be optimized away when it is not needed?

A few other observations:

o) Your check does not guarantee that %a and %ptr belong to the same 
memory object.  All you're guaranteeing is that %ptr points into a valid 
memory object with 4 bytes remaining in the memory object.  It's 
possible that %ptr overflowed the object pointed to by %a and is now 
within the bounds of another memory object (stack objects often have no 
padding in between them).  That's fine, but then it's not an array 
indexing check; it's a load/store check.

o) The object size lookup (if you have one), object size comparison, and 
trap are not atomic, which means that other instructions may get 
executed in-between them.  What happens if the memory object is freed by 
another thread in-between the comparison and the use of %ptr?  Is that 
okay for your goals?

o) If you're doing lookups, then objectsize can fail to find the bounds 
because the %ptr is invalid or because %ptr could be pointing to memory 
objects external to the code.  How do you tell if the latter case is 
possible?

o) The fact that the if/then/else is explicit in the IR may make 
hoisting such checks out of monotonic loops more difficult (although I 
could be wrong on this; haven't given it much thought).

>
>  - False positives are not allowed. If the implementation cannot 
> determine the size, then it should return 0/-1 (like the current 
> implementation)

Okay.  That makes sense, but you can already do this without changing 
the LLVM IR.  You can use the SAFECode instrumentation pass to insert 
checks, optimize them, and then remove those that are too expensive for 
what you are doing.

>  - The idea is *not* to build memory safety/debugging tools. The idea 
> is to deploy applications with these bounds checks enabled for 
> security purposes. Therefore we are not trying to be complete 
> (otherwise the performance penalty wouldn't be acceptable).

I think there might be some confusion in the use of the term "memory 
safety tool."  To be clear, I use "memory safety tool" to describe 
anything that detects one or more memory safety errors at run-time.  
Memory safety tools can be used for production code to prevent 
exploitation (if they are sufficiently fast) or as debugging tools (if 
they provide sufficient information about errors).

What you are proposing is a memory safety tool for production use; it 
will be fast because it won't try to do expensive checks.  The security 
policy it enforces seems reasonable.  You can build it the way you 
describe, but I'd rather see it built with a more general set of 
run-time checks that multiple tools can use.

>
>
> And of course, any documentation you may have is highly appreciated!

We have existing documentation in the SAFECode Software Architecture 
manual; I'll do some work this week to add more details on the run-time 
checks it employs.

Regards,

-- John T.

>
> Nuno
>
>
> Citando John Criswell <criswell at illinois.edu>:
>
>> On 5/7/12 7:24 PM, Nuno Lopes wrote:
>>> Hi,
>>>
>>> Please find in attach a patch to change the objectsize intrinsic's 
>>> signature.
>>> My proposal is to add a third parameter to control whether 
>>> objectsize is allowed to perform checks at run-time or not.
>>> This parameter is an integer, and a higher value indicates that 
>>> you're willing to accept a potentially higher run-time performance 
>>> penalty, while 0 means no work at run-time (the current behavior).
>>>
>>> The idea is to use this intrinsic, for example, for array bound 
>>> checking.
>>>
>>> Nothing is changed yet in objectsize: this patch only changes the 
>>> signature of the intrinsic and implements an auto-upgrade.
>>>
>>> Comments, ideas, etc..?
>>
>> Before I begin, I want to apologize for the lengthy reply.  However, 
>> I've been working on memory safety checks for a long time.
>> :)
>>
>> My initial impression is that I don't think this is the right 
>> approach.  The objectsize instruction, in your design, lacks 
>> information that is useful for optimizing run-time checks and making 
>> them more stringent with link-time optimization.
>>
>> First, your design does not distinguish between checks on 
>> loads/stores/atomics and checks on GEPs.  The problem with that is 
>> that some memory safety systems treat these checks differently.  Many 
>> systems (SoftBound and SAFECode being just two) will allow GEPs to 
>> generate out-of-bounds pointers so long as they are not deferenced.  
>> This means that a GEP check and a load/store check have different 
>> behavior when they fail.  This requires both different 
>> implementations of the checks as well as different rules of when and 
>> how they can be optimized.
>>
>> Second, load/store/atomic checks need the size of the memory access 
>> as well as its starting pointer to make sure that the load/store 
>> doesn't "fall off the end" of a memory object.  Your objectsize 
>> design does not provide that information.
>>
>> Third, your design does not specify whether a check is on a pointer 
>> which is only manipulated by code internal to the program or whether 
>> the pointer can be manipulated by or returned from external code.  A 
>> usable memory safety system needs to know the difference; memory 
>> safety guarantees need to be relaxed for pointers handled by external 
>> library code.  Otherwise, the application may exhibit false positives 
>> during execution.
>>
>> Additionally, whether a check is complete (because it checks a 
>> pointer handled by only internal code) or is incomplete (because it 
>> checks a pointer that can be manipulated by external code) needs to 
>> be communicated at the LLVM IR level.  This is because other LLVM 
>> IR-level analyses and transforms can be used to change incomplete 
>> checks to complete checks.  Your design currently leaves that problem 
>> to the code generator, making it more difficult for many LLVM 
>> developers to write incomplete to complete check transformations.
>>
>> In short, if you want to build a generic infrastructure for memory 
>> safety run-time checks, I strongly recommend that you start with the 
>> work we did on SAFECode; we've dealt with these issues, and we have a 
>> solution that we believe can be reused for memory safety tools other 
>> than our own and potentially by safe language implementations as 
>> well.  As an FYI, a GSoC project that uses our lessons from SAFECode 
>> to make generic instrumentation passes for memory safety has been 
>> accepted and will be led by Kostya from Google and co-mentored by me 
>> (if I understand the arrangement correctly).
>>
>> For the GSoC project, I should probably write up a document on the 
>> generic run-time checks and what they do.  Would you find this 
>> document useful?
>>
>> -- John T.
>>
>>> Thanks,
>>> Nuno