[LLVMdev] How to handle size_t in front ends?

Thu May 8 10:17:15 PDT 2008

On 2008-05-07, at 16:24, Chris Lattner wrote:

> On Wed, 7 May 2008, Gordon Henriksen wrote:
>
>>> What would this be used for?  How is it defined?  How does  
>>> arithmetic work on it?
>>
>> Looking up the intptr type via TargetData is not a significant  
>> issue for me, but I can see the appeal, and how its absence could  
>> constitute a significant barrier to generating portable IR  
>> (provided, of course, a portable language). Regardless, it would  
>> allow me to hardcode a good deal more codegen if the LLVM IR had an  
>> intptr type. The semantics I would imagine for an intptr type are:
>
> Querying TargetData only works if you know the size of the pointer. :)

Exactly. :) I'm going to play devil's advocate here for a moment.  
intptr would tidy up my own output a smidgen, but I do have other  
target dependencies, so it's of no great concern to me.

But I could see how someone wanting LLVM bitcode to play the role of  
Java bytecode or MSIL might find it important or even essential. And  
the question has come up many times.

I can also see how this is entirely useless in C and thus less than  
interesting. :)

>> • Treated an ordinary integer for all operations except casts.
>
> Ok.  What does this mean for add?

Sure. %x = add intptr %a, %b is semantically identical to:

%tmp1 = bitcast intptr %a to i8*
%tmp2 = getelementptr i8* %tmp1, intptr %b
%x = bitcast i8* %tmp2 to intptr

Or, put another way, it's an i32 add on a 32-bit host and an i64 add  
on a 64-bit host.

> This basically means that an intptr add cannot have usefully defined  
> semantics.

How do you figure? I consider getelementptr to have usefully defined  
semantics, even though they are target-dependent. :)

> Can you give an example of when it is useful?

Sure, grep for getIntPtrType.

But seriously, any situation where a front-end language would use  
size_t, ptrdiff_t, System.IntPtr, a value in a tagged object model,  
etc… it could use this type instead of conditionally selecting i32 or  
i64. This is not applicable to Java or C, which either have no such  
pointer-sized integer type, or have no portable representation. But it  
would be applicable to many other languages that do.

The advantage provided is improved portability of bitcode and (very  
slightly) reduced complexity in front-end compilers. I don't consider  
these overwhelming advantages, given that bitcode is pretty non- 
portable as-is.

>> • Can be the operand to ptrtoint, but not the result.
>> • Can be the result of inttoptr, but not the operand.
>
> I assume these are backwards.  intptr_t is an integer, not a pointer.

They are not.

>> • Can be bitcast to an actual pointer type.
>
> No. int <-> ptr is done with inttoptr and ptrtoint.

No. These cast behaviors are unique semantics.

Let me be more explicit. To be useful, an intptr type would need  
conversions to and from both fixed-width integer types and pointers.  
It's not necessary to overload existing casts. If we chose to, the  
casts applicable to pointers are closer matches than the casts  
applicable to integers, semantically. This is because they correctly  
reflect the potential data loss between the fixed-width integer type  
and the target-dependent type.

== Pointer conversions ==
For pointer conversions, bitcast has the correct semantics.

void *p;(void *) (ptrdiff_t) p; // This is a no-op on every platform. 
(void *) (int32_t) p; // This is target-dependent and could truncate.

Pointer-to-intptr-to-pointer conversions can be condensed or  
eliminated in the same way that bitcasts between pointer types can. By  
contrast, inttoptr(ptrtoint) cannot be converted to a bitcast or noop  
because if the integer type is smaller than the pointer type, the  
conversion is lossy.

== Conversions to fixed-width integer types ==

     size_t ip;
     (uint16_t) ip;
     (uint32_t) ip;
     (uint32_t) ip;

This has the same semantics as ptrtoint: Depending on the target, it  
could be an extend or a truncate or a noop.

     size_t ip;
     ssize_t sip;
     (uint64_t) ip;
     (int64_t) sip;

However, signed intptr types do exist, so it's quite arguable that  
sign extension behavior should not be fixed as it is in ptrtoint and  
gep sign extension.

For Ocaml, this might be beneficial, actually; a great many ptrtoint  
and inttoptr operations occur due to the tagged object model. Since  
these are lossy casts, it might be beneficial if they could be  
recognized target-independently as no-ops.

== Conversions from fixed-width integer types ==

     int16_t s, int32_t i, int64_t l;
     (size_t) s;
     (size_t) i;
     (size_t) l;

Same issues as with conversions to fixed-width integer types:
• inttoptr is a better match for the semantics.
• But sign extension behavior should be controllable.

On 2008-05-07, at 19:09, Chris Lattner wrote:

> On Wed, 7 May 2008, Jonathan S. Shapiro wrote:
>>
>
>> On a 32-bit platform, doesn't one want to use i32?
>
> Why?  What is wrong with i64?

Lots of things, actually.

It doesn't have the proper semantics for arithmetic. As a concrete  
example, System.IntPtr.operator/ in .NET is quite distinct from either  
System.Int32.operator/ or System.Int64.operator/.

Nor does it have the correct size in memory or as an argument,  
although converting to a pointer is a usable workaround in both cases.  
Likewise, alignment.

Finally, computing 64-bit intermediate results on 32-bit platforms in  
order to preserve unwanted i64 semantics is quite undesirable.  
Consider this, a reasonable sort of thing to compute with an intptr:

     int f(void *p, void *q, int i, int j) {
       size_t ip = (size_t) p, iq = (size_t) q;
       return (iq - (ip + i)) / j;
     }

If size_t is defined as int64_t and sizeof(void*) = 4, the divide must  
be computed in 64-bits (even though the high portion will be  
discarded) in order to preserve semantics in the uninteresting case  
that iq - (ip + i) > 0xFFFFFFFFU. Now imagine a target without a 64- 
bit divider. :) I guess each intermediate result could be cast back to  
a pointer and then back to an integer, but that seems unlovely.

On 2008-05-07, at 08:25, Jonathan S. Shapiro wrote:

> Meaning: on a machine having 32-bit registers, iWord is a type  
> treated by the IR as indistinguishable from i32. On a machine having  
> 64 bit registers, iWord is the a type treated by the IR as  
> indistinguishable from i64. Arithmetic works in the usual way. If  
> "iWord" is "i32" on your target, then it is acceptable in any  
> position and condition where "i32" would be acceptable in the IR  
> specification. In short, iWord can be substituted for the  
> appropriate integral type the instant you commit to a particular  
> target.

This doesn't work, because for instance trunc i32 to i32 is an illegal  
instruction. I say this under the assumption that entirely preventing  
interoperation with other forms of integers at the IR level is too  
strict.

— Gordon