[LLVMdev] dynamic typing system

Mon Aug 16 13:53:02 PDT 2010

Alec Benzer wrote:
> This isn't a strictly llvm-related problem, but I thought I'd ask 
> anyway to see if anyone can help.
>
> I'm trying to write a dynamically typed language on top of llvm. My 
> initial idea was to have a general object type for all objects in my 
> language. I came up with:
>
> { i8, i8* }
>
> the first element of the structure would hold the type of the object, 
> and the second is a pointer to the actual data.
>
> Now, I'm not exactly sure how to get my data allocated somewhere in 
> order to be able to get a pointer to it. My initial thought was heap 
> allocations, though there doesn't seem to be any llvm instructions 
> that perform heap allocations, though I imagine you just use C's 
> malloc and free? If you do use those, however, is there a way of 
> getting the byte-size of a type, to know what to pass to malloc? 
> There's also the issue of having to know when to be able to free() the 
> pointers.

For heap allocations, you need to insert a call to a function that 
performs heap allocations.  Creating a call to malloc is one 
possibility.  However, depending on your front-end language, you may 
wish to use a garbage-collection allocator (e.g., the Boehm conservative 
allocator) or a region-based allocator (e.g., the allocators used in the 
Automatic Pool Allocator and SAFECode projects or the kmem_cache_alloc() 
allocator used in Linux).  Whatever heap allocator you use is a design 
choice for your language.

To get the allocation size, use the TargetData analysis pass.  This pass 
has methods that can tell you the allocation size of various LLVM types.

Note that if you go with a garbage-collected heap allocator, you will 
probably need to do some additional work so that the GC knows where to 
find the roots to use for starting scans of the heap.  I believe LLVM 
provides intrinsics to aid with this.

-- John T.

>
> The other option, I guess, would be stack allocations with alloca 
> instructions? I don't need to worry about the sizes of types or about 
> calling free, but now my objects can't live on past the scope of a 
> function, which may complicate things. For instance, if at my jiting 
> repl (set up like the Kaleidoscope tutorial, where top-level 
> expressions are wrapped in lambdas and then executed), I type in "5", 
> the repl should spit 5 back to me. If I use allocas here there isn't a 
> problem. But if I define a global variable and assign 5 to it, the 
> data I alloca'd is going to be gone after the anonymous function 
> returns. This makes it seem like heap allocations would be a better 
> choice.
>
> So basically, I'm sort of stuck not knowing the best way to implement 
> this (or which way will even be possible). I'd appreciate any 
> input/guidance on how to proceed.