[llvm-commits] [llvm] r77259 - in /llvm/trunk: docs/LangRef.html include/llvm/Bitcode/LLVMBitCodes.h include/llvm/Operator.h lib/AsmParser/LLLexer.cpp lib/AsmParser/LLParser.cpp lib/AsmParser/LLToken.h lib/Bitcode/Reader/BitcodeReader.cpp lib/Bitcode/Writer/BitcodeWriter.cpp lib/VMCore/AsmWriter.cpp test/Assembler/flags-plain.ll test/Assembler/flags.ll

Wed Jul 29 13:18:35 PDT 2009

On 2009-07-29 21:55, Dan Gohman wrote:
> On Jul 29, 2009, at 9:44 AM, Török Edwin wrote:
>
>
>   
>> On 2009-07-29 19:26, Dan Gohman wrote:
>>
>>     
>>> On Jul 28, 2009, at 11:48 PM, Chris Lattner wrote:
>>>
>>>
>>>
>>>
>>>
>>>       
>>>> On Jul 27, 2009, at 2:53 PM, Dan Gohman wrote:
>>>>
>>>>
>>>>
>>>>         
>>>>> URL: http://llvm.org/viewvc/llvm-project?rev=77259&view=rev
>>>>>
>>>>> Log:
>>>>>
>>>>> Add a new keyword 'inbounds' for use with getelementptr. See the
>>>>>
>>>>> LangRef.html changes for details.
>>>>>
>>>>>
>>>>>
>>>>>           
>>>> The langref changes look nice.  The choice of "inbounds" is a little
>>>>
>>>> strange to me, isn't this flag mostly about wrapping behavior when
>>>>
>>>> overflow happens?
>>>>
>>>>
>>>>
>>>>         
>>> It turns out that avoiding wrapping and staying in bounds are  
>>> basically
>>>
>>> the same thing, since LLVM doesn't specify where in the address space
>>>
>>> it will allocate an object.
>>>
>>>
>>>
>>>
>>>
>>>       
>>>> Will "inbounds" be the default generated by the llvm-gcc/clang  
>>>> front-
>>>>
>>>> ends?  If so, it would be nice to invert the sense of the bit so it
>>>>
>>>> doesn't show up in all the .ll files making them really noisy.
>>>>
>>>>
>>>>
>>>>         
>>>
>>>
>>> I'm planning for it to be the default behavior.  I'm open to  
>>> discussion;
>>>
>>> there are pros and cons either way.
>>>
>>>       
>> Hi Dan,
>>
>> Some optimization (I think LICM) moves getelementptr out of the
>> basicblock where a predicate guarantees the inbounds property, thus  
>> you
>> may get
>> a getelementptr that computes an out of bounds value, but when the  
>> value
>> is dereferenced it will always be in bounds (due to that predicate).
>>
>> Requiring a getelementptr to be inbounds is rather strong, since there
>> are optimizations that can create GEPs that are not inbounds (as shown
>> above).
>> Either those optimizations should then clear the flag (any  
>> optimization
>> that moves GEPs out of a BB, or over a call), or inbounds should only
>> mean that it is 'inbounds'
>> when dereferenced.
>>     
>
> The strong meaning of inbounds is what I intend here. It represents
> properties guaranteed by the C standard, among other things.
>   

Agreed.

> Yes, LICM, if-conversion, and other passes will need to clear the
> flags when they move instructions, unless they can prove that they
> don't need to.
>
> Moving instructions past calls is a case that I hadn't considered.
> I assume you mean that a call could resize the allocation, or could
> deallocate and reallocate at the same address.
>   

I was thinking that it is not guaranteed that a code after a call is
reachable,
for example if the call is an abort, or some other no-return function,
or if the called functions throws an exception etc.
Thus moving the GEP across a  call is equivalent to the case of moving
it out of the BB mentioned earlier.

I don't expect this to cause too much trouble in practice though:
 - if the function reallocates the pointer, then the operand of GEP
doesn't exist before the call (it is the result of the call)
 - if the called function modifies a global variable that can affect the
pointer in some way, then moving the GEP is not allowed anyway because
it would produce incorrect results
(memdep should say the GEP depends on the call in this case)

We're still left with the case where the pointer is freed, but in that
case the GEP after the free is invalid anyway, and moving it before the
call to free doesn't change that (it may even make it valid!).

If the call could modify the base address in any way, then it should be
an alloca, that is stored/loaded before/after the call, so the GEP
couldn't be moved anyway.

That leaves us with the case where the call throws an exception, like:
char *y = malloc(size);
check(x, size);
y[x];

and check(x, size) does: if (x > size) throw 42;

In that case moving the y+x before check will violate the inbounds property.

> I think the way to handle this is to observe that LLVM IR allocation
> mechanisms don't permit resizing, and don't provide a way to
> force a reallocation to happen at the same address. A library
> allocator could do either of these, but LLVM doesn't know when or
> where library allocators allocate their storage, so I don't see any
> cases where this could cause an actual problem. I guess this is
> an argument in favor of defining inbounds in terms of integer
> overflow rather than in terms of allocated objects. I'll think
> about it.
>   

If all the optimizers are careful to preserve the inbounds property then
I don't see any problem, and having the inbounds property may allow
some interesting static analysis.
For that I think that the inbounds property should actually be 2 properties:
1. I as a user assert that the pointer is inbounds, and its my fault if
it is not and the program crashes, I'll call this: assertinbounds
2. Based on the assertinbounds properties provided by the user, an
optimizer/analysis pass can deduce that more pointers are inbounds, I'll
call these too assertinbounds,
because if the user was wrong, then these are wrong too (buggy).
3. The optimizer/analysis pass can prove that a pointer is always
inbounds, without using any assertinbouds property, and there is no
undefined behaviour/undefined result
that can cause the pointer to be out of bounds, I'll call this: safeinbounds
4. Optimizers/analysis passes can transform assertinbounds into
safeinbounds if it can prove the property

Then we could have a strict verifier (or lint if you prefer) that checks
that all GEPs, if they are all safeinbounds then its good.
If there are some GEPs with assertinbounds then it could insert runtime
checks to check that the assertion is correct.
This could be a useful debugging tool (like valgrind, or actually more
like -fmudflap).

TBH I don't like if a compiler creates code that may crash without
warning, because the user didn't follow all the language rules.
It is hard to tell for a large piece of code whether it follows the C
standard, or it may break due to optimizations allowed by the C language.
Debugging such bugs in the program is often hard (valgrind and mudflap
help), but it is even more helpful if the compiler can give a warning
about it.
That is what warnings are for IMHO, but I digress.

P.S.: Please don't take this as criticizing what you are doing, I think
that having better semantics for GEPs is good.
You can create GEPs without undefined behaviour now, if you turn off
this flag. Previously it wasn't possible to tell optimizers how to treat
GEPs, so this is a step in the right direction IMHO.
One could even teach bugpoint to selectively remove the inbounds flag
from GEPs until the program no longer crashes and thus pinpoint a faulty
assumption by the user
about the corectness of the pointer arithmetic used in his/her code ;)

Best regards,
--Edwin