[LLVMdev] How to handle size_t in front ends?
Gordon Henriksen
gordonhenriksen at mac.com
Thu May 8 10:17:15 PDT 2008
On 2008-05-07, at 16:24, Chris Lattner wrote:
> On Wed, 7 May 2008, Gordon Henriksen wrote:
>
>>> What would this be used for? How is it defined? How does
>>> arithmetic work on it?
>>
>> Looking up the intptr type via TargetData is not a significant
>> issue for me, but I can see the appeal, and how its absence could
>> constitute a significant barrier to generating portable IR
>> (provided, of course, a portable language). Regardless, it would
>> allow me to hardcode a good deal more codegen if the LLVM IR had an
>> intptr type. The semantics I would imagine for an intptr type are:
>
> Querying TargetData only works if you know the size of the pointer. :)
Exactly. :) I'm going to play devil's advocate here for a moment.
intptr would tidy up my own output a smidgen, but I do have other
target dependencies, so it's of no great concern to me.
But I could see how someone wanting LLVM bitcode to play the role of
Java bytecode or MSIL might find it important or even essential. And
the question has come up many times.
I can also see how this is entirely useless in C and thus less than
interesting. :)
>> • Treated an ordinary integer for all operations except casts.
>
> Ok. What does this mean for add?
Sure. %x = add intptr %a, %b is semantically identical to:
%tmp1 = bitcast intptr %a to i8*
%tmp2 = getelementptr i8* %tmp1, intptr %b
%x = bitcast i8* %tmp2 to intptr
Or, put another way, it's an i32 add on a 32-bit host and an i64 add
on a 64-bit host.
> This basically means that an intptr add cannot have usefully defined
> semantics.
How do you figure? I consider getelementptr to have usefully defined
semantics, even though they are target-dependent. :)
> Can you give an example of when it is useful?
Sure, grep for getIntPtrType.
But seriously, any situation where a front-end language would use
size_t, ptrdiff_t, System.IntPtr, a value in a tagged object model,
etc… it could use this type instead of conditionally selecting i32 or
i64. This is not applicable to Java or C, which either have no such
pointer-sized integer type, or have no portable representation. But it
would be applicable to many other languages that do.
The advantage provided is improved portability of bitcode and (very
slightly) reduced complexity in front-end compilers. I don't consider
these overwhelming advantages, given that bitcode is pretty non-
portable as-is.
>> • Can be the operand to ptrtoint, but not the result.
>> • Can be the result of inttoptr, but not the operand.
>
> I assume these are backwards. intptr_t is an integer, not a pointer.
They are not.
>> • Can be bitcast to an actual pointer type.
>
> No. int <-> ptr is done with inttoptr and ptrtoint.
No. These cast behaviors are unique semantics.
Let me be more explicit. To be useful, an intptr type would need
conversions to and from both fixed-width integer types and pointers.
It's not necessary to overload existing casts. If we chose to, the
casts applicable to pointers are closer matches than the casts
applicable to integers, semantically. This is because they correctly
reflect the potential data loss between the fixed-width integer type
and the target-dependent type.
== Pointer conversions ==
For pointer conversions, bitcast has the correct semantics.
void *p;(void *) (ptrdiff_t) p; // This is a no-op on every platform.
(void *) (int32_t) p; // This is target-dependent and could truncate.
Pointer-to-intptr-to-pointer conversions can be condensed or
eliminated in the same way that bitcasts between pointer types can. By
contrast, inttoptr(ptrtoint) cannot be converted to a bitcast or noop
because if the integer type is smaller than the pointer type, the
conversion is lossy.
== Conversions to fixed-width integer types ==
size_t ip;
(uint16_t) ip;
(uint32_t) ip;
(uint32_t) ip;
This has the same semantics as ptrtoint: Depending on the target, it
could be an extend or a truncate or a noop.
size_t ip;
ssize_t sip;
(uint64_t) ip;
(int64_t) sip;
However, signed intptr types do exist, so it's quite arguable that
sign extension behavior should not be fixed as it is in ptrtoint and
gep sign extension.
For Ocaml, this might be beneficial, actually; a great many ptrtoint
and inttoptr operations occur due to the tagged object model. Since
these are lossy casts, it might be beneficial if they could be
recognized target-independently as no-ops.
== Conversions from fixed-width integer types ==
int16_t s, int32_t i, int64_t l;
(size_t) s;
(size_t) i;
(size_t) l;
Same issues as with conversions to fixed-width integer types:
• inttoptr is a better match for the semantics.
• But sign extension behavior should be controllable.
On 2008-05-07, at 19:09, Chris Lattner wrote:
> On Wed, 7 May 2008, Jonathan S. Shapiro wrote:
>>
>
>> On a 32-bit platform, doesn't one want to use i32?
>
> Why? What is wrong with i64?
Lots of things, actually.
It doesn't have the proper semantics for arithmetic. As a concrete
example, System.IntPtr.operator/ in .NET is quite distinct from either
System.Int32.operator/ or System.Int64.operator/.
Nor does it have the correct size in memory or as an argument,
although converting to a pointer is a usable workaround in both cases.
Likewise, alignment.
Finally, computing 64-bit intermediate results on 32-bit platforms in
order to preserve unwanted i64 semantics is quite undesirable.
Consider this, a reasonable sort of thing to compute with an intptr:
int f(void *p, void *q, int i, int j) {
size_t ip = (size_t) p, iq = (size_t) q;
return (iq - (ip + i)) / j;
}
If size_t is defined as int64_t and sizeof(void*) = 4, the divide must
be computed in 64-bits (even though the high portion will be
discarded) in order to preserve semantics in the uninteresting case
that iq - (ip + i) > 0xFFFFFFFFU. Now imagine a target without a 64-
bit divider. :) I guess each intermediate result could be cast back to
a pointer and then back to an integer, but that seems unlovely.
On 2008-05-07, at 08:25, Jonathan S. Shapiro wrote:
> Meaning: on a machine having 32-bit registers, iWord is a type
> treated by the IR as indistinguishable from i32. On a machine having
> 64 bit registers, iWord is the a type treated by the IR as
> indistinguishable from i64. Arithmetic works in the usual way. If
> "iWord" is "i32" on your target, then it is acceptable in any
> position and condition where "i32" would be acceptable in the IR
> specification. In short, iWord can be substituted for the
> appropriate integral type the instant you commit to a particular
> target.
This doesn't work, because for instance trunc i32 to i32 is an illegal
instruction. I say this under the assumption that entirely preventing
interoperation with other forms of integers at the IR level is too
strict.
— Gordon
More information about the llvm-dev
mailing list