[llvm-commits] [llvm] r77460 - /llvm/trunk/docs/LangRef.html

Török Edwin edwintorok at gmail.com
Tue Aug 18 12:08:59 PDT 2009


On 2009-08-18 21:07, Dan Gohman wrote:
> On Aug 18, 2009, at 10:38 AM, Duncan Sands wrote:
>
>
>   
>> Hi Dan,
>>
>>
>>     
>>> Add one-past-the-end language to the inbounds keyword.
>>>
>>>       
>> is this useful for LLVM?  Also, I hear that C has a rule like this
>> for arrays, however I think in that case the address of the array
>> element one off the end is valid, but that may in fact be many
>> bytes off the end if the array element is large.  It's not clear
>> from your wording whether you mean one byte off the end is ok,
>> or something closer to C.
>>     
>
> The one-past-the-end rule is entirely motivated by the need to
> support C, and programming models that lower to C-like semantics.
>  From my reading of the standard, only one byte of address space
> is required to satisfy C's one-past-the-end rule.
>
> If I've misinterpreted the standard, or if there are programming
> models which require more than one byte and which could be
> accommodated by a reasonable change, I'm interested in hearing
> about it.
>   

There are programs which read 2 bytes for 2-byte aligned addresses,
4-bytes for 4-byte aligned, etc. and rely on the fact
that even though it reads uninitialized bytes, the memory is still valid
(since memory is allocated in pages).
I think this is why valgrind has the '--partial-loads-ok=yes' flag.

Such code doesn't conform to the C standard strictly, however I remember
doing that some glibc version used it in strcmp & friends, at least on
x86-32 (not sure if it was in assembly code or C code, but it wouldn't
surprise me if it were C code).

I think allowing to read at least min(AlignOf(ptr),16)-1 at the end of
an array/string is reasonable.
Actually the computer pointer's start address is inbounds according to
your rules, but the load/store itself is not.
So how about allowing to read at most 16 bytes whenever the start
address is inbounds?

Also I thought SimplifyLibcalls was doing some strcmp->memcmp
optimizations that relied on aligned loads being valid, but apparently
it does the transform only if the length of both strings are known.

P.S.: While looking at SimplifyLibcalls I also found a bug: it doesn't
check whether the GV initializer can be overriden during linking
via a non-weak def, I opened PR4738 about this.

Best regards,
--Edwin




More information about the llvm-commits mailing list