[LLVMdev] [RFC] Stackmap and Patchpoint Intrinsic Proposal

Filip Pizlo fpizlo at apple.com
Tue Oct 22 18:23:28 PDT 2013


On Oct 22, 2013, at 4:18 PM, Andrew Trick <atrick at apple.com> wrote:

> On Oct 22, 2013, at 3:08 PM, Filip Pizlo <fpizlo at apple.com> wrote:
> 
>> On Oct 22, 2013, at 1:48 PM, Philip R <listmail at philipreames.com> wrote:
>> 
>>> On 10/22/13 10:34 AM, Filip Pizlo wrote:
>>>> On Oct 22, 2013, at 9:53 AM, Philip R <listmail at philipreames.com> wrote:
>>>> 
>>>>> On 10/17/13 10:39 PM, Andrew Trick wrote:
>>>>>> This is a proposal for adding Stackmaps and Patchpoints to LLVM. The
>>>>>> first client of these features is the JavaScript compiler within the
>>>>>> open source WebKit project.
>>>>>> 
>>>>> I have a couple of comments on your proposal.  None of these are major enough to prevent submission.
>>>>> 
>>>>> - As others have said, I'd prefer an experimental namespace rather than a webkit namespace.  (minor)
>>>>> - Unless I am misreading your proposal, your proposed StackMap intrinsic duplicates existing functionality already in llvm.  In particular, much of the StackMap construction seems similar to the Safepoint mechanism used by the in-tree GC support. (See CodeGen/GCStrategy.cpp and CodeGen/GCMetadata.cpp).  Have you examined these mechanisms to see if you can share implementations?
>>>>> - To my knowledge, there is nothing that prevents an LLVM optimization pass from manufacturing new pointers which point inside an existing data structure.  (e.g. an interior pointer to an array when blocking a loop)  Does your StackMap mechanism need to be able to inspect/modify these manufactured temporaries?  If so, I don't see how you could generate an intrinsic which would include this manufactured pointer in the live variable list.  Is there something I'm missing here?
>>>> These stackmaps have nothing to do with GC.  Interior pointers are a problem unique to precise copying collectors.
>>> I would argue that while the use of the stack maps might be different, the mechanism is fairly similar.
>> 
>> It's not at all similar.  These stackmaps are only useful for deoptimization, since the only way to make use of the live state information is to patch the stackmap with a jump to a deoptimization off-ramp.  You won't use these for a GC.
>> 
>>> In general, if the expected semantics are the same, a shared implementation would be desirable.  This is more a suggestion for future refactoring than anything else.
>> 
>> I think that these stackmaps and GC stackmaps are fairly different beasts.  While it's possible to unify the two, this isn't the intent here.  In particular, you can use these stackmaps for deoptimization without having to unwind the stack.
> 
> I think Philip R is asking a good question. To paraphrase: If we introduce a generically named feature, shouldn’t it be generically useful? Stack maps are used in other ways, and there are other kinds of patching. I agree and I think these are intended to be generically useful features, but not necessarily sufficient for every use.
> 
> The proposed stack maps are very different from LLVM’s gcroot because gcroot does not provide stack maps! llvm.gcroot effectively designates a stack location for each root for the duration of the current function, and forces the root to be spilled to the stack at all call sites (the client needs to disable StackColoring). This is really the opposite of a stack map and I’m not aware of any functionality that can be shared. It also requires a C++ plugin to process the roots. llvm.stackmap generates data in a section that MCJIT clients can parse.
> 
> If someone wanted to use stack maps for GC, I don’t know why they wouldn’t leverage llvm.stackmap. Maybe Filip can see a problem with this that I can't.

You're right, it could work.

If you were happy with spilling all of your GC roots, then you could put them into allocas and then pass the allocas' addresses to a stackmap.  This will give you a FP offset of the roots.

If you were happy with an accurate GC that couldn't move objects referenced from the stack then you could have each safepoint call use patchpoint, and then if you also implemented stack unwinding, you could use the patchpoints' implicit stackmaps to figure out which registers (or stack slots) contained pointers.

These would be niche uses, I think.  If you care about performance then you're not going to use an accurate GC that requires spilling roots; you'll go for some GC algorithm that can handle conservative stack roots.  If you're using accurate GC support for moving objects then it's usually because you need to move *all* objects (after all you can move *most* objects without any GC roots or stackmaps by using Bartlett's algorithm or similar) so the calls-as-patchpoints approach won't work.

I could kind of see some real-time GC's using the alloca+stackmap approach, but it's a bit of a stretch.

So, I don't see stackmaps as being particularly practical for accurate GC, but I do concede that you *could* implement some kind of accurate GC that uses stackmaps for some part of its stack scanning.

> The runtime can add GC roots to the stack map just like other live value, and it should know how to interpret the records. The intrinsic doesn’t bake in any particular interpretation of the mapped values. That said, my proposal deliberately does not cover GC. I think that stack maps are the easy part of the problem. The hard problem is tracking interior pointers, or for that matter exterior/out-of-bounds or swizzled pointers. LLVM’s machine IR simply doesn’t have the necessary facilities for doing this. But if you don’t need a moving collector, then you don’t need to track derived pointers as long as the roots are kept live. In that case, llvm.stackmap might be a nice optimization over llvm.gcroot.
> 
> Now with regard to patching. I think llvm.patchpoint is generally useful for any type of patching I can imagine. It does look like a call site in IR, and it’s nice to be able to leverage calling conventions to inform the location of arguments. But the patchpoint does not have to be a call after patching, and you can specify zero arguments to avoid using a calling convention. In fact, we only currently emit a call out of convenience. We could splat nops in place and assume the runtime will immediately find and patch all occurrences before the code executes. In the future we may want to handle NULL call target, bypass call emission, and allow the reserved bytes to be less than that required to emit a call.
> 
> -Andy

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131022/2fdb4774/attachment.html>


More information about the llvm-dev mailing list