[cfe-commits] implicit sign/bitwidth conversions during array indexing?

Mon Nov 17 11:21:55 PST 2008

On Mon, Nov 17, 2008 at 10:42 AM, Ted Kremenek <kremenek at apple.com> wrote:
> I think too much of the original thread got lost.  Your right that there is
> no implicit cast for array subscripting.  We do insert implicit casts for
> pointer arithmetic.  For example:
> void f(int *p) {
>   short i = 0;
>   long long k = 0;
>   int x;
>   x = *(p + i);   // implicit cast for 'i' from short to int
>   x += *(p + k); // no implicit cast for 'k' from long long to int
>
>   return x;
> }
> The standard says that E1[E2] is the same as *(E1 + E2), so I was curious
> why there was no implicit cast from long long to int when doing pointer
> arithmetic.  I also think that we should probably have an implicit cast when
> performing array indexing, but that's a separate matter from my original
> question.

Oh... we should make it consistent.  If we want to stick to the
wording of the standard, we shouldn't insert the cast in either case,
but it doesn't particularly matter.

> The issue with inserting an implicit cast from long long to int is
> that conceptually, it isn't there!  The fact that we truncate long
> long to i32 in CodeGen is really just an artifact of the
> implementation on 32-bit machines.
>
> Regardless of Codegen, conceptually the types are changing.  Even for the
> following code we insert an implicit cast from 'long' to 'int' even on a
> 32-bit machine:
>   long x = 0;
>   int y = x;

C99 6.5.16.1: "In simple assignment (=), the value of the right
operand is converted to the type of the assignment expression."  We
prefer to represent all conversions explicitly in the AST, so an
ImplicitCastExpr gets inserted.

> Obviously, in the case of a 64-bit machine the implicit cast becomes even
> more valuable since it implies a truncation during CodeGen.
> For the array indexing (or rather pointer arithmetic), is it not the case
> that the 'long long' is being converted (always) to an integer with the same
> width as a pointer?  Moreover, the sign of the integer type is changing.
>  This seems like something that should be semantically captured in the ASTs.

No, it's not being converted to the same width as the pointer per the
semantics in the standard.  Take the following:

int arry[10];

int a() {
  uint64_t x = (1ULL << 33) + 1;
  // Version A
  return arry[x];
  // Version B
  return arry[(uint32_t)x];
}

Per the semantics in the standard, A is undefined behavior, B is
well-defined.  If we add a cast to the AST for version A, we lose this
distinction.

The standard bit is rather long, so I'll just refer to C99 6.5.6p8;
the important point is that pointer+int is specified in terms of the
value of the integer, and has nothing to do with the width of
anything; all overflow is explicitly undefined.  It's quite similar to
the way GEP is defined in LLVM; just because the result happens to
wrap the way you want to for a naive implementation, it doesn't mean
that we're making any guarantees.

In clang on a 32-bit platform using normal CodeGen, both A and B
happen to compile to the same code, but that's not really relevant
here; it would be perfectly legal per the standard, for example, to
use GEP with a 64-bit operand in clang CodeGen (which LLVM might
optimize to undef), or to add a array-bounds-checker that asserts for
version A.

-Eli