[LLVMdev] getting started with IR needing GC
Terence Parr
parrt at cs.usfca.edu
Mon Apr 21 16:12:15 PDT 2008
On Apr 20, 2008, at 6:52 PM, Gordon Henriksen wrote:
> On 2008-04-20, at 21:05, Terence Parr wrote:
>
>> On Apr 20, 2008, at 5:36 PM, Gordon Henriksen wrote:
>>
>>> Since the semispace heap doesn't actually work (it's an example,
>>> at best), I suggest you simply copy the stack visitor into your
>>> project; it's only a dozen lines of code or so.
>>
>>
>> Ok, copying; can't find ShadowStackEntry though. Even make in that
>> dir doesn't work:
>
> Please use the version from subversion; this is broken in 2.2
> release, unfortunately.
ah! ok, looks better now. :)
>> how does the gc "shadow-stack" gcroot intrinsic work exactly? I
>> couldn't read the assembly very well. Seems my example above
>> wouldn't work would it unless i create/fill in a shadow stack record?
>>
>
> 'gc "shadow-stack"' in the LLVM IR instructs the code generator to
> automatically maintain the linked list of stack frames. You don't
> have to do anything to maintain these shadow stack frames except to
> keep your variables in the llvm.gcroot'd allocas. Essentially, it
> does this:
>
> struct ShadowStackEntry {
> ShadowStackLink *next;
> const ShadowStackMetadata *metadata;
> void *roots[0];
> };
Ok, bear with me here...
What's the difference between ShadowStackLink and ShadowStackEntry?
> template <size_t count>
> struct Roots {
> ShadowStackLink *next;
> const ShadowStackMetadata *metadata;
> void *roots[0];
> };
>
> ShadowStackEntry *shadowStackHead;
>
> // Defined by the code generator.
> const ShadowStackMetadata f_metadata = ...;
Do you mean generated by my front end that emits IR or do you mean the
backend? It seems that, since I read the source code and build the
symbol table, I would need to build this stack frame type information
for LLVM.
> void f() {
> Roots<3> roots;
> roots.next = shadowStackHead;
> roots.metadata = f_metadata;
> roots.roots[0] = NULL;
> roots.roots[1] = NULL;
> roots.roots[2] = NULL;
What are the three roots here? Not sure where anything but the next,
metadata are coming from. So the gc "shadow-stack" generates that
preamble code? That would make sense
> shadowStackHead = (ShadowStackEntry *) &roots;
>
> ... user code ...
here is where my gcroots go then I guess.
> shadowStackHead = entry.next; // before any exit
> return;
> }
Can you tell me where to find ShadowStackMetadata? A search does not
reveal it:
/usr/local/llvm-2.2 $ find . -name 'ShadowStackMetadata*'
>> Taking a giant step back, I can build something similar to
>> semispace.c myself so I'm in control of my world, right? i would
>> set up the shadow stack using IR instructions and could avoid
>> gcroot by notifying my collector as I see fit...
>
> That's true; the shadow stack design is explicitly for uncooperative
> environments, after all.
The compiler plug-in for a GC is like a sophisticated macro that knows
how to emit preambles and post ambles for each function that says it
uses that particular GC, right? Does it do more than an include such
as figuring out which alloca's I have that are pointers? If so, then
why do I need to use gcroot instructions to identify roots? Seems like
it would be much easier to understand to just have my output templates
emit the preamble and so on. Oh, maybe the optimizer remove some
stuff in there for what I think is a root is actually not around
anymore.
> When you want to eliminate the shadow stack overhead, you will need
> to (a.) use a conservative GC or (b.) emit stack frame metadata
> using the LLVM GC support.
Unfortunately, I'm thoroughly confused about who generates what. Who
is supposed to generate the meta data types? If I am, that is fine,
but I really can't find anything in the documentation that is a simple
end to end C code -> IR example. Once I get one together, I'll put it
in the book I'm writing. I've spent many hours reading and playing as
much as I can, but it is still not clear; 'course I ain't always that
bright. ;) Note that the paper by Henderson was extremely clear to
me, so it's not the contents, it is the details of using LLVM to do GC.
>> Sorry I'm so lost...just trying to figure out what llvm does for me
>> and what I have to do.
>
> No problem!
>
> Generally speaking, LLVM is going to help you find roots on the
> stack, which is the part that the compiler backend must help with;
> the rest is your playground.
Is that because only code generation knows what roots exist after
processing the IR?
> The infrastructure is more suited toward interfacing with an
> existing GC rather than necessarily making writing a new runtime
> trivial. (See exception handling for precedent…)
Well, writing a new garbage collector seems really straightforward
(like to mark and sweep). LLVM will give me the roots and I am free
to walk them. The part that I don't understand is who defines what
metadata types and how exactly I make use of gcroot and LLVM's
support. The concepts are clear, the details seem miles away ;)
Thanks for all the help...
Has anybody else on the list gotten a trivial GC'd language working I
could look at? All go back to the scheme translator again to see what
I can learn.
Thanks,
Ter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20080421/b4f56100/attachment.html>
More information about the llvm-dev
mailing list