[LLVMdev] Suggestion: Support union types in IR
Jon Harrop
jon at ffconsultancy.com
Thu Jan 1 06:25:43 PST 2009
On Thursday 01 January 2009 00:57:50 Talin wrote:
> Jon Harrop wrote:
> > I think very few people are still using C-style unions so there is
> > probably little desire to put them into LLVM itself. Class hierarchies
> > and algebraic datatypes are much more common and use a different
> > representation (that LLVM can already handle with bitcasts).
> > Specifically, as a pointer to one of several struct types that can vary
> > in size.
> >
> > I am considering a design along the following lines. All algebraic
> > datatypes are stored as a pointer to an int tag:
> >
> > i32 *
>
> Exactly. I'm not especially interested in C-style unions, I'm interested
> in discriminated unions. But the actual discriminator field is easily
> represented in LLVM IR already, so there's no need to extend the IR to
> support them. That's why I am only asking for C-style union support -
> it's a lower-level primitive that you can use to build discriminated
> unions on top of.
I don't think you would want to build discriminated unions on top of C-style
unions though.
> The problem with varying-size structs is that it would be nice to be
> able to pass them as function parameters and function return values
> using LLVM's support of structs as first-class data types.
As I understand it, LLVM does not provide first-class structs and attempting
to return arbitrary structs from functions will randomly break. I believe you
need to implement that yourself in whatever way your target language
requires.
> That requires that the code that handles argument marshalling know the size
> of the largest possible type that can be passed.
Pass a pointer and that size will always be 1 word. You can still stack
allocate the space if you must and LLVM's optimization passes will do their
best to completely remove that overhead when possible.
> > The tag conveys the true type of the value and, indeed, it may even be a
> > pointer to comprehensive run-time type information. So you load the tag
> > and examine it in order to determine what type to cast to. Then you
> > bitcast to another type if you need to load any arguments (that appear
> > after the tag). For example, an int option type might cast as follows:
> >
> > None (tag=0) -> {i32} *
> > Some n (tag=1) -> {i32, i32} *
>
> Since the language I am working on is statically typed, I was going to
> select a type discrimination strategy based on the actual types
> involved. For example, in the case of a union of reference types,
> there's no need for a discriminator field,
Only if they are all different reference types. Otherwise you might have the
situation:
type t = Negate of t | Factorial of t
Where both constructors carry an argument of the same type (t) and you do
still need a tag to distinguish between the cases (Negate and Factorial) in
the union.
> since reference types already
> carry around their own type information. On the other hand, if you have
> a union of two enum types of limited range, the high bits of the enum
> would be available. In cases such as int or float where all the bits
> were used, then a separate discriminator field would be needed.
There is certainly some scope here and I have been thinking about similar
things. I would like both run-time type information and efficient pattern
matching over algebraic datatypes.
The run-time type information can be a field from every reference type that
points to its run-time type. When pattern matching, you usually want to
bisect the number of cases covered in the union to perform a balanced search
and achieve the best performance. That usually assumes the presence of
consecutive integers as tags so the use of effectively-random pointers would
throw that off a bit, especially if the run-time type information is
subjected to a moving garbage collector. Alternatively, I could put a
redundant tag in each value on the heap...
--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e
More information about the llvm-dev
mailing list