[LLVMdev] Adding safe-point code generation
Gordon Henriksen
gordonhenriksen at me.com
Wed Jun 24 11:06:07 PDT 2009
On 2009-06-23, at 15:18, Jeffrey Yasskin wrote:
> On Tue, Jun 23, 2009 at 4:18 AM, Gordon Henriksen <gordonhenriksen at me.com
> > wrote:
> Hi Jeffrey,
>
> On 2009-06-22, at 01:49, Jeffrey Yasskin wrote:
>
>> I need to add thread-switching support to Unladen Swallow's JIT,
>> and LLVM's safe point support looks like a good way to get code
>> into all the right places. However, http://llvm.org/docs/GarbageCollection.html#collector-algos
>> points out that there's no way to emit code at safe points yet,
>> and there are no loop safe points at all. So I'll be trying to
>> implement them.
>>
>> Is there anything I should know before starting?
>
> Sounds like you've got the right idea.
>
>> One way to do this might be to add a FunctionPass to
>> LLVMTargetMachine::addCommonCodeGenPasses() alongside
>> createGCLoweringPass(), which would insert user-defined code for
>> safe points. Unfortunately, code inserted there would stick around
>> in the IR after the machine code was emitted, and if the function
>> were JITted again, we'd get duplicate safe points.
>
> Unfortunately, I don't believe this is workable. It would make this
> work much easier if it were.
>
>> Another way to do it might be to add a MachineFunction pass next to
>> createGCMachineCodeAnalysisPass() (or instead of it), which could
>> emit appropriate MachineInstructions to implement the safe point.
>> This, of course, forces safe points to be written in
>> MachineInstructions instead of IR instructions, which isn't ideal.
>
> I think this is the way to go, though it's also the most involved.
> My primary rationale is that code generation can hack on the CFG,
> even introducing loops where there were none expressed in the IR. It
> could be that I'm being unnecessarily pessimistic on this point,
> though.
>
> As a specific example of the code generator hacking on the CFG, take
> atomic operations which expand to loops on architectures which use
> load-reserved/store-conditional to implement these primitives. It
> may not be necessary or desirable to add safe points to these loops,
> but perhaps should be preferred on the basis o correctness.
>
> As another example, consider a 64-bit integer divide on a 32-bit
> architecture expanding to a libcall. Some, but perhaps not all,
> collection algorithms would want to emit safe point code for this
> call, but it simply does not exist in the IR to instrument.
>
> I'm worried that it'll be necessary for some garbage collectors and
> hurt performance too much for others. Should it be configurable? I
> don't need to insert safe points in such loops and calls for my use,
> so I'd try to make it extensible, but then leave support for these
> loops and calls to someone who actually needs them.
That sounds best.
> Also, code injection of the form 'give me 8 bytes of noops at each
> safe point' and 'insert a cold instruction sequence at the end of
> the function' are best expressed at the machine code level. Safe
> points are hot code and unusual, target-specific techniques are
> regularly used with them if you survey the literature, so a design
> which accommodates that reality is preferred, even though hacking on
> the MachineFunction representation is less pleasant than the IR.
>
> I'd like to allow target-specific techniques, but also allow target-
> independent techniques, and the target-independent techniques are
> more important for what I'm actually doing. Now that I've looked
> into MachineInstructions more, it looks like it's impossible to use
> them in a target-independent way.
I agree. I think the lowering would actually occur in the
SelectionDAG, rather than at the MachineFunction level.
> One element of this design that is desirable from a design
> perspective is that it preserves the original IR. Chris has said
> that it's a long-term goal of LLVM to not mangle the Function during
> code generation, and this moves in that direction instead of
> regressing.
>
> I definitely like that feature.
Yep.
>> Another way might be to run a pass over the IR inserting
>> llvm.safepoint() calls, which could be implemented as a function in
>> the module. Then we'd want a MachineFunction pass to inline this
>> for us during codegen. The llvm.safepoint() calls could be easily
>> identified and removed if the IR needs to be re-used.
>
> I see this as fairly equivalent to the first option.
>
> I don't, because the user can control how they lower
> llvm.safepoint(). For uses like mine, where I'd rather write the
> safe point in IR, I can implement it as a function (if I have a
> selectiondag-level inliner). For more advanced garbage collectors,
> they can implement it as a custom lowering in their target. The
> biggest downside of an llvm.safepoint intrinsic would be that it
> couldn't capture loops and calls introduced by lowering, unless
> those lowerings explicitly introduced new safepoints.
Right.
> Also, regardless, stop point markers (a label is actually generated)
> need to survive as such into the MachineFunction representation else
> we'll never be able to generate a register map.
>
> Hope that helps,
> Gordon
>
>
> P.S. There's an interesting circularity which I may not have
> accounted for in the original design: If code is injected at each
> safe point, and a call instruction is injected, do we need to
> generate another safe point for that call? Clearly, the expansion of
> a safe point cannot be recursive with itself; but I think that we
> should allow generating a register map at the return address of that
> call, just as some collectors may want to instrument the libcall
> case discussed above.
>
> Actually, this distinction between safe points for inserting code
> and safe points for frame maps is probably is a critical design
> issue for your use case. Our current definition of a safe point is
> at the return address of a call instruction, which is precisely
> what's required to walk the stack. This is NOT the location where
> you want to add a call to your runtime.
>
> I think you're saying that we need two kinds of safe points: points
> at which to insert code, and points that the
> GCFunctionInfo::iterator traverses. The code-inserting safe points
> could insert traversed safe points.
That's a good summary of the problem, yes.
— Gordon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20090624/f5383227/attachment.html>
More information about the llvm-dev
mailing list