[LLVMdev] Assorted notes on garbage collection with LLVM
Michael Lewis
don.apoch at gmail.com
Fri Aug 2 20:31:10 PDT 2013
On Fri, Aug 2, 2013 at 1:28 PM, Sean Silva <chisophugis at gmail.com> wrote:
> Thanks for sharing. If there is anything which you feel would be relevant to
> add to our our official documentation, please feel free to send patches (or
> new documents <http://llvm.org/docs/SphinxQuickstartTemplate.html> if you
> have a lot to say).
Thanks - I'll set aside some time over the weekend to put together some docs.
On Fri, Aug 2, 2013 at 3:18 PM, Jon Harrop
<jonathandeanharrop at googlemail.com> wrote:
> I believe your approach is potentially faster but substantially more
> complicated so I would regard it as an optimization over the very simple
> approach that I used. Therefore, it would be very interesting and valuable
> see the performance advantages of your approach. Have you done any
> benchmarks?
I've done some *very* quick and dirty tests, but I'm still honing in
on getting my frontend to generate good IR, so that the optimizer
passes can really work their magic. There's still also a lot of
low-hanging fruit to optimize in the GC itself.
I have a basic raytracer (funny that you chose a similar test!) which
renders a shaded sphere floating over a plane, with simple shadowing
and lighting. On my current hardware, the tracer runs at 600x600
pixels and roughly 19 frames per second (~52 ms per frame) with
garbage collection fully enabled. A comparable implementation in C++
runs at about 23 FPS, so something like a 16% difference in speed.
Keep in mind that the Epoch language implementation has a lot of work
yet to go - I'm fully confident I can narrow that margin appreciably
by introducing key features to the language like greater control over
memory layout and locality of reference.
I also want to build in some heuristics to the Epoch runtime to avoid
triggering GC stack/heap traversals as often as they are currently
done. That should gain a reasonable amount of performance by itself.
Implementing a moving/compacting collector and some other goodies
should go even further (since I can do cheap allocations instead of
basically relying on malloc under the hood), and once the
optimizations are really cranking, I think I can get to within a
couple percent of my Visual C++ build of the raytracer.
On Fri, Aug 2, 2013 at 3:55 PM, Manuel Jacob <me at manueljacob.de> wrote:
> I'm glad to hear that you solved the issues you described in your last
> mail. How did you solve them? Was that because of the stack-coloring bug?
As it turns out almost all of my headaches were caused by the
stack-coloring issue. The basic approach described on the Epoch wiki
page seems to work fine as long as stack slots are not merged.
> Where do the negative stack offsets come from? The stack root walking code
> in my project doesn't need to check for negative stack offsets.
This is interesting to hear... all I know is that LLVM hands me
negative offsets for some emitted functions. I have not yet
ascertained the cause for this.
> What are bookmarks?
Bookmarks are a hack to get around FPO optimizations. Suppose we have
a call stack that looks like this:
Epoch function A [FPO disabled]
External function 1 [FPO enabled]
External function 2 [FPO enabled]
Epoch function B [FPO disabled]
Epoch function C [FPO disabled]
If we trigger GC in function A, we have to walk the stack, but FPO can
potentially cause the two stack frames for the external code to
"vanish" - depending on how the optimization is performed, we might
end up getting a frame pointer that points into C instead of B. The
bookmark system just records that we "exited" the Epoch-controlled
stack frame in function B, during the invocation of External Function
2. When A goes to run GC, it verifies that B's stack frame is walked
regardless of where the FPO-mangled stack leads us. Since B has FPO
disabled, we know it will walk correctly into C's stack frame.
> What's the purpose of the TriggerGarbageCollection function?
For verification, the Epoch GC doesn't behave like a standard
production GC, that is, it will collect aggressively instead of only
when under memory pressure, or generationally, etc.
TriggerGarbageCollection is a simple shim invoked from JITted Epoch
code which causes a collection cycle to occur.
I also use this shim to capture the current stack pointer and hand it
off to the garbage worker routines. This is the best way I could think
of to capture the stack pointer accurately without relying on weird
hacks like taking the address of a local variable, etc.
> Have you considered using a hash map in GarbageWorker() that maps safe
> points to lists of GCRoots? I think this would simplify the code and reduce
> the overhead of stack root walking.
Yep! This is in fact one of the next optimizations I plan on tackling.
It should be a solid win both in terms of code cleanliness and
performance, as you observed.
> Are you interested in improving the code generator's support for garbage
> collection? I posted some thoughts in a mail with the Subject "New ideas
> about how to improve garbage collection support". That would also solve the
> stack-coloring bug.
I'd love to work on getting GC faster and more accessible, but I'm
afraid the internals of the code generator are still a bit out of
scope of my personal project (which is the development of a novel
language). Since I only have so much free time to spare, it's a tough
call whether to invest that time in LLVM or in my own project.
That said, I would be more than happy to contribute back to the LLVM
project as soon as I am confident that I have something of value to
offer. For now the best I can provide is some guidance on how to use
the existing facilities; I'd need to spend a lot more time in the LLVM
core to feel comfortable suggesting changes there.
- Mike
More information about the llvm-dev
mailing list