[llvm-commits] [llvm] r77259 - in /llvm/trunk: docs/LangRef.html include/llvm/Bitcode/LLVMBitCodes.h include/llvm/Operator.h lib/AsmParser/LLLexer.cpp lib/AsmParser/LLParser.cpp lib/AsmParser/LLToken.h lib/Bitcode/Reader/BitcodeReader.cpp lib/Bitcode/Writer/BitcodeWriter.cpp lib/VMCore/AsmWriter.cpp test/Assembler/flags-plain.ll test/Assembler/flags.ll

Dan Gohman gohman at apple.com
Wed Jul 29 14:13:59 PDT 2009


On Jul 29, 2009, at 1:18 PM, Török Edwin wrote:


> On 2009-07-29 21:55, Dan Gohman wrote:
>
>> Yes, LICM, if-conversion, and other passes will need to clear the
>>
>> flags when they move instructions, unless they can prove that they
>>
>> don't need to.

To correct myself here, LICM and most other passes don't need to worry
about this, for the reasons Eli mentioned.

>>
>>
>>
>> Moving instructions past calls is a case that I hadn't considered.
>>
>> I assume you mean that a call could resize the allocation, or could
>>
>> deallocate and reallocate at the same address.
>>
>>
>>
>
> I was thinking that it is not guaranteed that a code after a call is
> reachable,
> for example if the call is an abort, or some other no-return function,
> or if the called functions throws an exception etc.
> Thus moving the GEP across a  call is equivalent to the case of moving
> it out of the BB mentioned earlier.
>
> I don't expect this to cause too much trouble in practice though:
> - if the function reallocates the pointer, then the operand of GEP
> doesn't exist before the call (it is the result of the call)
> - if the called function modifies a global variable that can affect  
> the
> pointer in some way, then moving the GEP is not allowed anyway because
> it would produce incorrect results
> (memdep should say the GEP depends on the call in this case)
>
> We're still left with the case where the pointer is freed, but in that
> case the GEP after the free is invalid anyway, and moving it before  
> the
> call to free doesn't change that (it may even make it valid!).
>
> If the call could modify the base address in any way, then it should  
> be
> an alloca, that is stored/loaded before/after the call, so the GEP
> couldn't be moved anyway.

Yep. SSA is cool :-).

>
> That leaves us with the case where the call throws an exception, like:
> char *y = malloc(size);
> check(x, size);
> y[x];
>
> and check(x, size) does: if (x > size) throw 42;
>
> In that case moving the y+x before check will violate the inbounds  
> property.

This is effectively the same situation as moving a getelementptr
across a basic block boundary (calls that may throw are basic
block boundaries).

>
>
>> I think the way to handle this is to observe that LLVM IR allocation
>>
>> mechanisms don't permit resizing, and don't provide a way to
>>
>> force a reallocation to happen at the same address. A library
>>
>> allocator could do either of these, but LLVM doesn't know when or
>>
>> where library allocators allocate their storage, so I don't see any
>>
>> cases where this could cause an actual problem. I guess this is
>>
>> an argument in favor of defining inbounds in terms of integer
>>
>> overflow rather than in terms of allocated objects. I'll think
>>
>> about it.
>>
>>
>>
>
> If all the optimizers are careful to preserve the inbounds property  
> then
> I don't see any problem, and having the inbounds property may allow
> some interesting static analysis.
> For that I think that the inbounds property should actually be 2  
> properties:
> 1. I as a user assert that the pointer is inbounds, and its my fault  
> if
> it is not and the program crashes, I'll call this: assertinbounds
> 2. Based on the assertinbounds properties provided by the user, an
> optimizer/analysis pass can deduce that more pointers are inbounds,  
> I'll
> call these too assertinbounds,
> because if the user was wrong, then these are wrong too (buggy).
> 3. The optimizer/analysis pass can prove that a pointer is always
> inbounds, without using any assertinbouds property, and there is no
> undefined behaviour/undefined result
> that can cause the pointer to be out of bounds, I'll call this:  
> safeinbounds
> 4. Optimizers/analysis passes can transform assertinbounds into
> safeinbounds if it can prove the property
>
> Then we could have a strict verifier (or lint if you prefer) that  
> checks
> that all GEPs, if they are all safeinbounds then its good.
> If there are some GEPs with assertinbounds then it could insert  
> runtime
> checks to check that the assertion is correct.
> This could be a useful debugging tool (like valgrind, or actually more
> like -fmudflap).

This sounds like it would be a useful feature. Instead of storing
safeinbounds information in the IR, I'd suggest storing it on the
side in an Analysis pass. I think that would still allow all of
the same functionality, and it'd avoid adding what everything else
would see as a redundant flag in the IR.

One complication is that the inbounds flag does not say that an
out-of-bounds index will never happen. It effectively says that the
result value of the getelementptr doesn't matter if the index is
out of bounds. Full undefined behavior isn't incurred until the
result value is actually used for a load or a store, for example.

This is a difference from C, where undefined behavior is incurred
immediately upon overflow.

It would still be possible to write a dynamic checker though; you
could expand the getelementptr into code that returns null (or some
other invalid address) if the inbounds condition is not met.

>
> P.S.: Please don't take this as criticizing what you are doing, I  
> think
> that having better semantics for GEPs is good.
> You can create GEPs without undefined behaviour now, if you turn off
> this flag. Previously it wasn't possible to tell optimizers how to  
> treat
> GEPs, so this is a step in the right direction IMHO.

I appreciate that you're thinking through the issues and asking
questions. Thanks!

> One could even teach bugpoint to selectively remove the inbounds flag
> from GEPs until the program no longer crashes and thus pinpoint a  
> faulty
> assumption by the user
> about the corectness of the pointer arithmetic used in his/her code ;)

That's a neat idea :-).

Dan





More information about the llvm-commits mailing list