[LLVMdev] getting started with IR needing GC

Mon Apr 21 16:12:15 PDT 2008

On Apr 20, 2008, at 6:52 PM, Gordon Henriksen wrote:

> On 2008-04-20, at 21:05, Terence Parr wrote:
>
>> On Apr 20, 2008, at 5:36 PM, Gordon Henriksen wrote:
>>
>>> Since the semispace heap doesn't actually work (it's an example,  
>>> at best), I suggest you simply copy the stack visitor into your  
>>> project; it's only a dozen lines of code or so.
>>
>>
>> Ok, copying; can't find ShadowStackEntry though. Even make in that  
>> dir doesn't work:
>
> Please use the version from subversion; this is broken in 2.2  
> release, unfortunately.

ah! ok, looks better now. :)

>> how does the gc "shadow-stack" gcroot intrinsic work exactly?  I  
>> couldn't read the assembly very well.  Seems my example above  
>> wouldn't work would it unless i create/fill in a shadow stack record?
>>
>
> 'gc "shadow-stack"' in the LLVM IR instructs the code generator to  
> automatically maintain the linked list of stack frames. You don't  
> have to do anything to maintain these shadow stack frames except to  
> keep your variables in the llvm.gcroot'd allocas. Essentially, it  
> does this:
>
>     struct ShadowStackEntry {
>         ShadowStackLink *next;
>         const ShadowStackMetadata *metadata;
>         void *roots[0];
>     };

Ok, bear with me here...

What's the difference between ShadowStackLink and ShadowStackEntry?

>       template <size_t count>
>     struct Roots {
>         ShadowStackLink *next;
>         const ShadowStackMetadata *metadata;
>         void *roots[0];
>     };
>
>     ShadowStackEntry *shadowStackHead;
>
>     // Defined by the code generator.
>     const ShadowStackMetadata f_metadata = ...;

Do you mean generated by my front end that emits IR or do you mean the  
backend? It seems that, since I read the source code and build the  
symbol table, I would need to build this stack frame type information  
for LLVM.

>         void f() {
>         Roots<3> roots;
>         roots.next = shadowStackHead;
>         roots.metadata = f_metadata;
>         roots.roots[0] = NULL;
>         roots.roots[1] = NULL;
>         roots.roots[2] = NULL;

What are the three roots here? Not sure where anything but the next,  
metadata are coming from.  So the gc "shadow-stack" generates that  
preamble code? That would make sense

>         shadowStackHead = (ShadowStackEntry *) &roots;
>
>         ... user code ...

here is where my gcroots go then I guess.

>         shadowStackHead = entry.next; // before any exit
>         return;
>     }

Can you  tell me where to find ShadowStackMetadata?  A search does not  
reveal it:

/usr/local/llvm-2.2 $ find . -name 'ShadowStackMetadata*'

>> Taking a giant step back, I can build something similar to  
>> semispace.c myself so I'm in control of my world, right?  i would  
>> set up the shadow stack using IR instructions and could avoid  
>> gcroot by notifying my collector as I see fit...
>
> That's true; the shadow stack design is explicitly for uncooperative  
> environments, after all.

The compiler plug-in for a GC is like a sophisticated macro that knows  
how to emit preambles and post ambles for each function that says it  
uses that particular GC, right?  Does it do more than an include such  
as figuring out which alloca's I have that are pointers? If so, then  
why do I need to use gcroot instructions to identify roots? Seems like  
it would be much easier to understand to just have my output templates  
emit the preamble and so on.  Oh, maybe the optimizer remove some  
stuff in there for what I think is a root is actually not around  
anymore.

> When you want to eliminate the shadow stack overhead, you will need  
> to (a.) use a conservative GC or (b.) emit stack frame metadata  
> using the LLVM GC support.

Unfortunately, I'm thoroughly confused about who generates what.  Who  
is supposed to generate the meta data types?  If I am, that is fine,  
but I really can't find anything in the documentation that is a simple  
end to end C code -> IR example. Once I get one together, I'll put it  
in the book I'm writing. I've spent many hours reading and playing as  
much as I can, but it is still not clear; 'course I ain't always that  
bright. ;)  Note that the paper by Henderson was extremely clear to  
me, so it's not the contents, it is the details of using LLVM to do GC.

>> Sorry I'm so lost...just trying to figure out what llvm does for me  
>> and what I have to do.
>
> No problem!
>
> Generally speaking, LLVM is going to help you find roots on the  
> stack, which is the part that the compiler backend must help with;  
> the rest is your playground.

Is that because only code generation knows what roots exist after  
processing the IR?

> The infrastructure is more suited toward interfacing with an  
> existing GC rather than necessarily making writing a new runtime  
> trivial. (See exception handling for precedent…)

Well, writing a new garbage collector seems really straightforward  
(like to mark and sweep).  LLVM will give me the roots and I am free  
to walk them. The part that I don't understand is who defines what  
metadata types and how exactly I make use of gcroot and LLVM's  
support. The concepts are clear, the details seem miles away ;)

Thanks for all the help...

Has anybody else on the list gotten a trivial GC'd language working I  
could look at? All go back to the scheme translator again to see what  
I can learn.

Thanks,
Ter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20080421/b4f56100/attachment.html>