[LLVMbugs] [Bug 4270] New: getelementptr is underspecified

bugzilla-daemon at cs.uiuc.edu bugzilla-daemon at cs.uiuc.edu
Tue May 26 09:07:59 PDT 2009


           Summary: getelementptr is underspecified
           Product: Documentation
           Version: trunk
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: General docs
        AssignedTo: unassignedbugs at nondot.org
        ReportedBy: gohman at apple.com
                CC: llvmbugs at cs.uiuc.edu

This paragraph from LangRef.html:

Note that it is undefined to access an array out of bounds: array and pointer
indexes must always be within the defined bounds of the array type when
accessed with an instruction that dereferences the pointer (e.g. a load or
store instruction). The one exception for this rule is zero length arrays.
These arrays are defined to be accessible as variable length arrays, which
requires access beyond the zero'th element.

raises several questions. I'm working on adding the concept of undefined
integer arithmetic overflow to LLVM, and also GEP expansion, so I'm filling
this bug in order to work towards clarification of the rules.

The first sentence seems to suggest that it's well defined to compute
arbitrary addresses, as long as they are not dereferenced. Especially since
there is no other mention of C's "one-past-the-end" provision, this
sentance seems to take that role by saying that in LLVM IR, addresses
N-past-the-end, or even N-ahead-of-the-beginning, may be computed, for
any N.

However, the second sentence makes a special provision for
zero-length array types. If N-past-the-end addresses are permitted, this
wouldn't really be an exception, but instead just an example of the
standard rule.

Also, there is also a rumor that GEP overflow is intended to be
undefined behavior. This isn't mentioned in LangRef.html, but it's been
heard spoken in a variety of places, and if it's true, it would seem to
rule out N-past-the-end. However in that case, there's nothing guaranteeing
one-past-the-end, which is needed for C support.

So first, assuming %A points to an array of [10 x double], which of the 
following instructions are intended to be undefined?
  %a = getelementptr double* %A, i64 -1
  %b = getelemnetptr double* %A, i64 9223372036854775807
  %c = getelementptr double* %A, i64 10

Second, is there anything undefined about this code?

  %p3 = getelementptr [3 x [3 x double]]* %p, i64 0, i64 0, i64 3
  store double 0.0, double* %p3

(assume %p3 points to sufficient storage)
The last index 3 is outside the bounds implied by the static type
implied by the base pointer and the gep, however the computed address
is within the bounds of the underlying allocated storage.

The following comment from BasicAliasAnalysis.cpp suggests that this
code is valid and that optimizers should handle it correctly:

  // We have to be careful here about array accesses.  In particular, consider:
  //        A[1][0] vs A[0][i]
  // In this case, we don't *know* that the array will be accessed in bounds:
  // the index could even be negative.  Because of this, we have to
  // conservatively *give up* and return may alias.  We disregard differing
  // array subscripts that are followed by a variable index without going
  // through a struct.

Third, if there are any cases where a getelementptr by itself (with no
load or store) is "undefined", is it Undefined Behavior, as in
"demons may fly out your nose", or is it merely that the getlementptr
may return an unspecified result?

Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

More information about the llvm-bugs mailing list