[LLVMdev] Adding safe-point code generation
Gordon Henriksen
gordonhenriksen at me.com
Tue Jun 23 04:18:43 PDT 2009
Hi Jeffrey,
On 2009-06-22, at 01:49, Jeffrey Yasskin wrote:
> I need to add thread-switching support to Unladen Swallow's JIT, and
> LLVM's safe point support looks like a good way to get code into all
> the right places. However, http://llvm.org/docs/GarbageCollection.html#collector-algos
> points out that there's no way to emit code at safe points yet, and
> there are no loop safe points at all. So I'll be trying to implement
> them.
>
> Is there anything I should know before starting?
Sounds like you've got the right idea.
> One way to do this might be to add a FunctionPass to
> LLVMTargetMachine::addCommonCodeGenPasses() alongside
> createGCLoweringPass(), which would insert user-defined code for
> safe points. Unfortunately, code inserted there would stick around
> in the IR after the machine code was emitted, and if the function
> were JITted again, we'd get duplicate safe points.
Unfortunately, I don't believe this is workable. It would make this
work much easier if it were.
> Another way to do it might be to add a MachineFunction pass next to
> createGCMachineCodeAnalysisPass() (or instead of it), which could
> emit appropriate MachineInstructions to implement the safe point.
> This, of course, forces safe points to be written in
> MachineInstructions instead of IR instructions, which isn't ideal.
I think this is the way to go, though it's also the most involved. My
primary rationale is that code generation can hack on the CFG, even
introducing loops where there were none expressed in the IR. It could
be that I'm being unnecessarily pessimistic on this point, though.
As a specific example of the code generator hacking on the CFG, take
atomic operations which expand to loops on architectures which use
load-reserved/store-conditional to implement these primitives. It may
not be necessary or desirable to add safe points to these loops, but
perhaps should be preferred on the basis o correctness.
As another example, consider a 64-bit integer divide on a 32-bit
architecture expanding to a libcall. Some, but perhaps not all,
collection algorithms would want to emit safe point code for this
call, but it simply does not exist in the IR to instrument.
Also, code injection of the form 'give me 8 bytes of noops at each
safe point' and 'insert a cold instruction sequence at the end of the
function' are best expressed at the machine code level. Safe points
are hot code and unusual, target-specific techniques are regularly
used with them if you survey the literature, so a design which
accommodates that reality is preferred, even though hacking on the
MachineFunction representation is less pleasant than the IR.
One element of this design that is desirable from a design perspective
is that it preserves the original IR. Chris has said that it's a long-
term goal of LLVM to not mangle the Function during code generation,
and this moves in that direction instead of regressing.
> Another way might be to run a pass over the IR inserting
> llvm.safepoint() calls, which could be implemented as a function in
> the module. Then we'd want a MachineFunction pass to inline this for
> us during codegen. The llvm.safepoint() calls could be easily
> identified and removed if the IR needs to be re-used.
I see this as fairly equivalent to the first option.
Also, regardless, stop point markers (a label is actually generated)
need to survive as such into the MachineFunction representation else
we'll never be able to generate a register map.
Hope that helps,
Gordon
P.S. There's an interesting circularity which I may not have accounted
for in the original design: If code is injected at each safe point,
and a call instruction is injected, do we need to generate another
safe point for that call? Clearly, the expansion of a safe point
cannot be recursive with itself; but I think that we should allow
generating a register map at the return address of that call, just as
some collectors may want to instrument the libcall case discussed above.
Actually, this distinction between safe points for inserting code and
safe points for frame maps is probably is a critical design issue for
your use case. Our current definition of a safe point is at the return
address of a call instruction, which is precisely what's required to
call the stack. This is NOT the location where you want to add a call
to your runtime.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20090623/6c07a73a/attachment.html>
More information about the llvm-dev
mailing list