<div dir="ltr">Sanjoy: This document which I wrote several years ago may be of some use to you:<div><br></div><div><a href="https://docs.google.com/document/d/1-ws0KYo47S0CgqpwkjfWDBJ8wFhW_0UYKxPIJ0TyKrQ/edit?usp=sharing&authkey=COD8_LcL" target="_blank">Building a Stack Crawler in LLVM</a><br>


</div><div><br></div><div>I have successfully implemented a copying collector using LLVM. I did not implement support for interior pointers, however I have a number of ideas on how to approach it. The language that I was implementing was similar to C# in that classes were divided into "reference" types and "value" types. A pointer to a reference type on the heap always pointed to the start of an allocation, whereas a pointer to a value type was always an interior pointer. (In other words, a value type could never exist by itself in the heap, it had to be embedded within some reference type). This means that the compiler could always know whether a pointer was an internal pointer or not. Thus, pointers to reference types were just machine-level pointers, whereas pointers to value types consisted of a pointer+offset, with the pointer part pointing to the start of a heap allocation. Combined with the fact that pointers to value types were relatively rare, this allowed internal pointers with a minimum of overhead.</div>


<div><br></div><div>Now, having said all that, I feel compelled to give a few warnings. Part of the reason I abandoned this project was because of limitations in LLVM's garbage collection intrinsics, which I have written about extensively on this list. The current llvm.gcroot strategy requires the frontend to be very complex, generate highly inefficient code, and that code is mostly unoptimizable since LLVM's optimizers generally won't touch a value that has been marked as a GC root.</div>


<div><br></div><div>Worse, the support for GC in the LLVM community is fairly low - the garbage collection intrinsics in LLVM have not been updated or improved in the 7 years of my following the project, and there's been very little discussion of GC on the mailing list (I do a search for the word "collect" in the LLVM archives about once a month, which is how this thread came to my attention.) Most of the people working on/with LLVM are working with non-GC languages, or with languages that have simple enough memory models (e.g. "everything is an atom") that the existing intrinsics are sufficient. There are also a few people who have gotten around the problems by defining their own stack frames instead of using the LLVM intrinsics.</div>


<div><br></div><div>There have been numerous proposals over the years for better GC intrinsics, but nothing has come out of these discussions so far. There's a good reason for this: improving the GC support would require a major commitment, since all of the backend code generators and optimizers would potentially be affected.</div>


<div><br></div><div>My current favorite GC proposal involves annotating types - that is, to define a new kind of derived type that is essentially a 2-tuple consisting of a base type + GC metadata (the second argument to llvm.gcroot). This means that "root-ness" would be a property of a type rather than a value, which means that the rootness could automatically be propagated to intermediate values or SSA values through optimization without the frontend having to do a lot of spilling and reloading of values. (Plus having the ability to associate metadata with types might be useful for other things besides GC.)<br>


</div><div><br></div>

</div><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Oct 24, 2013 at 2:32 PM, Sanjoy Das <span dir="ltr"><<a href="mailto:sanjoy@azulsystems.com" target="_blank">sanjoy@azulsystems.com</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hello llvm-dev!<br>

<br>

My colleages and I are currently evaluating llvm's suitability as a<br>

JIT compiler interfacing with a precise, relocating garbage collector.<br>

While we couldn't find code or writeups that deal with the issues<br>

specific to this design goal, it is entirely possible that we may have<br>

missed something; we would appreciate references to relevant code or<br>

writeups that people on this list may be aware of.<br>

<br>

As an example, one issue that makes this non-trivial is that llvm (as<br>

far as we know) is free to manufacture pointers to locations _inside_<br>

objects, something referred to as a "derived pointer" in some places.<br>

Since these pointers need to be updated in sync with the objects they<br>

point into, a relocating GC needs to be aware of them; and the runtime<br>

needs to be able to read off which registers and stack slots hold such<br>

pointers at every safepoint.  We've looked into llvm's existing GC<br>

support, and the mechanism it provides does not seem to help in this<br>

use case.<br>

<br>

This email is deliberately terse, but we are more than happy to get<br>

into details about the approaches we've considered as the discussion<br>

progresses.  Pointers to existing work related to this or similar<br>

issues is especially welcome.<br>

<br>

Thanks!<br>

<br>

-- Sanjoy<br>

<br>

_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>

</blockquote></div><br><br clear="all"><div><br></div>-- <br>-- Talin

</div>