[PATCH] D90708: [LangRef] Clarify GEP inbounds wrapping semantics

Nuno Lopes via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Nov 11 12:20:15 PST 2020


nlopes added inline comments.


================
Comment at: llvm/docs/LangRef.rst:9782
+   means that it points into an allocated object, or to its end (which is one
+   byte past the last byte contained in the object). The only *in bounds*
+   address for a null pointer in the default address-space is the null pointer
----------------
jrtc27 wrote:
> nikic wrote:
> > nlopes wrote:
> > > I still don't like the current writing. I would need to see some evidence from language standards that they require pointers past the end of objects.
> > What would be a better wording? "One past the end" is a term of art, and as such should be well understood: https://www.google.com/search?q=one+past+the+end
> > If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
> 
> https://port70.net/~nsz/c/c11/n1570.html#6.5.6p8
Thanks for the reference. Though that paragraph doesn't say that a pointer 1 byte past the end is valid.
It says that the following is valid:
int x[n]
q = p+(n-1); // points to the last element
q = p+1; // points to one element past the last

Doesn't say that `(char*)(p+n)+1` is valid, which is what it means for a pointer 1 byte past the end to be valid.

So AFAICT, both the C & C++ standards agree that p+n is the max one needs to support.

My suggestion is simply to remove the part in parenthesis "(which is one byte past the last byte contained in the object)". Or replace it with similar wording of the C++ standard (corresponds to a hypothetical next element or something like that).


================
Comment at: llvm/docs/LangRef.rst:9789
+   index type in a signed sense (``nsw``).
+*  The successive addition of offsets (without adding the base address) does
+   not wrap the pointer index type in a signed sense (``nsw``).
----------------
nikic wrote:
> jrtc27 wrote:
> > nlopes wrote:
> > > It's a bit stronger than that. The addition of each offset to the preceding pointer should not overflow. You can't do e.g.:
> > > gep inbounds %p, -1, 1
> > > 
> > > because %p-1 is OOB, even though the result is in bounds (because %p must be in bounds).
> > It's more nuanced than that, no? `%p` could be a pointer part-way through (or one past the end of) an object, in which case `%p-1` would still be in bounds?
> This is specified in the next bullet point (successive addition to the base pointer must remain in bounds of the allocated object).
Ok, right. Then this is more of a corollary of the point below. Sounds correct at least. I'm happy to keep it.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90708/new/

https://reviews.llvm.org/D90708



More information about the llvm-commits mailing list