[llvm-commits] llvm-gcc: correct handling of arrays of var-sized elements

Duncan Sands baldrick at free.fr
Wed Mar 28 09:23:36 PDT 2007


In gcc, the length of an array may only be known at runtime, for
example "char X[n];" is legal; the same goes for array types: the
length of the array type may depend on the value of a variable.
This is a gcc C extension, so somewhat rare, but is widely used in
languages like Ada.  You can construct pointers to such types and
arrays where the element type is such a variable-sized array.  This
is one way of getting an array where the element type does not have
a fixed size; the other way is to use a variable-sized struct as the
element type.  Currently such arrays are handled wrong: loads and
stores to elements (via ARRAY_REF) access the wrong memory location.

The reason for this is simple enough: the gcc element type (EType) and
the corresponding LLVM type (ETypeLLVM) have different sizes.  This means
that LLVM arrays of the LLVM element type lay their components out in
memory differently to the gcc array.  For example, suppose the element
type EType is "char [n]", a length n string, and the array type AType is
an array of three ETypes, which I'll write as "(char [n])[3]".

Right now these get converted as follows:

GCC		LLVM
char[n]		i8
(char [n])[3]	[3 x i8]

GCC memory layout
element: 0 0      0  1  1       1    2      2
   byte: 0 1 ... n-1 n n+1 ... 2n-1 2n .. 3n-1

LLVM memory layout
element: 0 1 2
   byte: 0 1 2

Referencing element 1 accesses 1 byte from the start when it should be
accessing the n'th byte from the start.  The testcase shows an example
of the bogus results this can give (2007-03-27-ArrayCompatible.c).

The conversion to an LLVM array is clearly bogus.  There is an analogous
problem with pointers: using GetElementPtr to do pointer offsets is only
valid if the LLVM type is the same size as the GCC type, since otherwise
the pointer will be advanced by the wrong amount.

The patch introduces utility functions isSequentialCompatible and
isArrayCompatible that apply to a gcc array (or pointer type).  If
isSequentialCompatible returns true, then elements of the gcc type
are laid out in memory the same as the corresponding LLVM elements.
Thus GetElementPtr can be used to access them.  If isArrayCompatible
returns true, then the gcc array corresponds to an LLVM array,
laying out its components the same way.

isSequentialCompatible relies on the following invariant: if the size
of a gcc type is a constant, then the corresponding LLVM type has the
same size [1].  I've added an assertion in llvm-types to check that this
is true.  isSequentialCompatible simply returns whether the gcc element
type has constant size [1].  Thus isSequentialCompatible returns true for
a variable length array with a constant size element type.

isArrayCompatible returns true if the array has a constant length and the
element type has constant size [2].

The patch then fixes up a bunch of array code to use these.  It also
modifies pointer code to use isSequentialCompatible, but these modifications
are minor since the pointer code already got it right.

For example, both isSequentialCompatible and isArrayCompatible return
false for the array type example described above.  The conversions are
now:

GCC		LLVM
char[n]		i8
(char [n])[3]	i8

and the LLVM memory layout is done using pointer arithmetic.

The patch also introduces two generally useful methods requested by Chris:
isInt64 and getInt64.  These tell you whether a gcc constant fits into 64
bits, and gets the constant for you (aborting if it doesn't fit).  They are
analogous to host_integerp and tree_low_cst, only using 64 bit integers rather
than HOST_WIDE_INT.  They are used for example to tell whether the size in
bits of the gcc type is small enough (< 2^64) to correspond to an LLVM type [3].
The main difference with getINTEGER_CSTVal is that getInt64 refuses to return
constants that overflowed or are simply too big for 64 bits.

I corrected a number of other small problems while I was there:

- I unified the pointer and array cases in TreeToLLVM::EmitLV_ARRAY_REF.  The
pointer code did a better job than the array code and the array code benefits
from this: indexing into a variable length array with a constant size element
type (i.e. when isSequentialCompatible is true) now uses a GetElementPtr rather
than 'orrible pointer arithmetic.  This produces vastly better code when, for
example, accessing elements of an array like "int X[n]".  There's a testcase
for this (2007-03-27-VarLengthArray.c).

- The size passed to an AllocaInst might not be an Int32.  Probably impossible
to hit in practice.

- If the index type in TreeConstantToLLVM::EmitLV_ARRAY_REF was an unsigned
32 bit integer, you could get wrong code on a 64 bit machine.  Wildly unlikely
it could ever be hit.

Bootstraps, causes no testsuite failures (including multisource) and indeed
on my system causes 2003-05-22-VarSizeArray.c to pass rather than crash the
gcc 4.1 compiler (CBE) because the improved array indexing causes the testcase
to be simplified down to one line: ret i32 0.

Enjoy!

Duncan.

[1] To be exact, the invariant is: if the size of the gcc type in bits
is a non-negative constant smaller than 2^64 then the LLVM type has the
same size in bits.

[2] It also returns true if the array has no length and the element type
has constant size, like "int X[];".  In order to preserve the invariant
described in [1], it has to return true if the gcc array type has constant
size, which means that variable length arrays with an element type of zero
size and zero length arrays with a variable size element type are both
accepted; they have size zero and so does the corresponding LLVM array.

[3] It is easy to create huge types, for example an array of 2^64 arrays
of length 2^64.  This will not map to an LLVM type.  It is huge objects
that are hard to create.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: array_of_var.diff
Type: text/x-diff
Size: 19303 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20070328/47e15ee2/attachment.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 2007-03-27-ArrayCompatible.c
Type: text/x-csrc
Size: 162 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20070328/47e15ee2/attachment.c>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 2007-03-27-VarLengthArray.c
Type: text/x-csrc
Size: 136 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20070328/47e15ee2/attachment-0001.c>


More information about the llvm-commits mailing list