[LLVMdev] GC StackMaps (was Stackmap and Patchpoint Intrinsic Proposal)

Andrew Trick atrick at apple.com
Wed Oct 23 16:37:45 PDT 2013


On Oct 23, 2013, at 1:17 AM, Gaël Thomas <gael.thomas at lip6.fr> wrote:

> Hi all,
> 
> I don't know if I understand everything, but it seems really
> interesting for a runtime developer, stackmap and patchpoint looks
> perfect for a lot of optimizations :) I just have few question to
> verify if I understand what are these stackmaps and patchpoints, and I
> discuss the GC after.
> 
> * I have a first very simple scenario (useful in vmkit). Let's imagine
> that we want to lazily build the layout of an object at runtime, i.e.,
> we don't know the layout of the object when we are emitting the code.
> And, we want to access to a field of this object identified by a
> symbol. If I understand correctly, we can use your stackmap to define
> the offset of this field and then patch the code that use this offset?
> The machine code will like  mov %(rax)offset, .., and the stackmap
> will generate a map that contains the location of "offset" in the
> code? If it's the case, it's perfect.

I agree with Filip's response. I'm not sure I exactly what you envision though.

Ignoring calling conventions for the moment, with the current implementation you'll get something like this for llvm.patchpoint(ID, 12, %resolveGetField, %obj):

mov (%rax), %rcx  ; load object pointer
--- patchpoint ---
movabsq $resolveGetField, $r11
callq *$r11
--- end patchpoint ---
ret             ; field value in %rax

The stack map will indicate that %rcx holds the object pointer.

You can then patch it with:

mov (%rax), %rcx  ; load object pointer
--- patchpoint ---
mov offset(%rcx), $rax ; load field at offset
nop
nop...
--- end patchpoint ---
ret             ; field value in %rax

> * Now, let's imagine that I want to lazily call a virtual method (aka,
> a single dispatch call in Java). I have two problems. First, I have to
> know the offset of the method in the virtual table (just like for
> virtual calls in c++). Second, the entry in the table should contain a
> stub able to link/compile the target method at runtime. And the stub
> has to know which object has received the call (which drives the
> method resolution and the update of the virtual table). With stackmaps
> and patchpoints, I can imagine something like that (in pseudo-llvm
> without typing)
> 
> %r0 = load %obj, 0 ; the virtual table is at offset 0
> %r1 = 0
> stackmap %r1, ID_OFFSET ; contains the offset of the target method in
> the virtual table
> %r2 = add %r1, %r0
> %r3 = load %r2
> patchpoint ID_CALL %r3, %obj, other parameters ; to find %obj in the stub
> 
> I should be able to:
> - patch ID_OFFSET when I load the description of obj (before the call,
> when the object is allocated)
> - use ID_CALL to know which object is the target of the call in order
> to find the appropriate method. If it could be the case, your
> patchpoint and safepoint are very interesting for vmkit. We just need
> a function to retreive, in the caller, from which patchpoint we are
> coming from (probably by inspecting the stack, we can find the program
> pointer of the call site and then find the patchpoint descriptor?)

I think you're invisioning patching instructions that are emitted by LLVM outside of a patchpoint's reserved space. While that is possible to do using llvm.stackmap, it is not a good idea to assume anything about LLVM instruction selection or scheduling.

As Filip said, just use a single patchpoint for the vtable call sequence and only patch within the reserved bytes.

llvm.patchpoint(ID, nbytes, %resolveVCall, %obj):

Say the stackmap locates %obj in $r1 and a return value in $r3, you would simply patch 'nbytes' of code as follows:

--- patchpoint ---
mov (%r1), $r2 ; load vtable ptr
mov $vtableofs($r2), $r3
--- end patchpoint ---

You could potentially expose the vtable ptr load to LLVM IR.

Note that patching at llvm.stackmap (not llvm.patchpoint) would only be done if you don't want to resume execuction within the same compiled function after reaching stack map.

> * Now, for the GC, if I understand correctly, instead of declaring a
> variable as a root, you can declare explicitly the safepoints by using
> patchpoints with something like
> 
> patchpoint ID_safepoint_17, suspendTheThreadForCollection, list of the
> alloca (or registers) that contains objects
> 
> Then in the suspendTheThreadForCollection, we can see that we are
> coming for the safepoint_17 and then find the locations of the
> objects? If a patchpoint can work like this, it's probably a good
> building block for the gc.
> 
> Currently, we have to declare the root objects with the root
> intrinsic, then add the appropriate safepoints (it's just a call to
> GCFunctionInfo.addSafePoint). As root objects are marked as root,
> modifying GCFunctionInfo.addSafepoint to generate a patchpoint with
> all the gc roots as argument (instead of using the current
> infrastructure) should not be difficult. And it probably means that
> the current gc infrastructure could use patchpoint as a backend. The

Yes, but I think it's more robust to call addSafepoint during MC lowering when we know that register assignments can't change. I see two options

- Call addSafepoint() within the AsmPrinter where we currently call StackMaps::recordStackMap.

- Run addGCPasses() after addPreEmitPass() in Passes.cpp. I'm not sure if there's a good reason we currently run it before block layout and target-specific passes.

> only problem that I see is that all the objects will be transmitted as
> arguments to suspendTheThreadForCollection, it's maybe not the best
> way to do that. Probably, something like:
> safepoint ID_safepoint_17, list of alloca that contains objects
> patchpoint ID_safepoint_17, suspendTheThreadForCollection
> should be better to avoid useless arguments?

No, just use patchpoint, passing '0' as the number of calling convention arguments. All patchpoint operands after the arguments are identical to stackmap operands:

 patchpoint(ID_safepoint_17, suspendThread, 0, list of alloca)

> 
> See you,
> Gaël
> 
> PS: just, tell me if the code is already in the trunk, because I would
> like to see if these intrinsics can work for vmkit :)

The implementation is not final, but a working patch set is up for review. See 
http://llvm-reviews.chandlerc.com/D1996 and its dependent patches.

-Andy

> 
> 
> 2013/10/23 Andrew Trick <atrick at apple.com>:
>> I'm moving this to a different thread. I think the newly proposed
>> intrinsic definitions and their current implementation are valuable
>> regardless of how it gets tied into GC...
>> 
>> On Oct 22, 2013, at 6:24 PM, Philip R <listmail at philipreames.com> wrote:
>> 
>> Adding Gael as someone who has previously discussed vmkit topics on the
>> list.  Since I'm assuming this is where the GC support came from, I wanted
>> to draw this conversation to the attention of someone more familiar with the
>> LLVM implementation than myself.
>> 
>> On 10/22/13 4:18 PM, Andrew Trick wrote:
>> 
>> On Oct 22, 2013, at 3:08 PM, Filip Pizlo <fpizlo at apple.com> wrote:
>> 
>> On Oct 22, 2013, at 1:48 PM, Philip R <listmail at philipreames.com> wrote:
>> 
>> On 10/22/13 10:34 AM, Filip Pizlo wrote:
>> 
>> On Oct 22, 2013, at 9:53 AM, Philip R <listmail at philipreames.com> wrote:
>> 
>> On 10/17/13 10:39 PM, Andrew Trick wrote:
>> 
>> This is a proposal for adding Stackmaps and Patchpoints to LLVM. The
>> first client of these features is the JavaScript compiler within the
>> open source WebKit project.
>> 
>> I have a couple of comments on your proposal.  None of these are major
>> enough to prevent submission.
>> 
>> - As others have said, I'd prefer an experimental namespace rather than a
>> webkit namespace.  (minor)
>> - Unless I am misreading your proposal, your proposed StackMap intrinsic
>> duplicates existing functionality already in llvm.  In particular, much of
>> the StackMap construction seems similar to the Safepoint mechanism used by
>> the in-tree GC support. (See CodeGen/GCStrategy.cpp and
>> CodeGen/GCMetadata.cpp).  Have you examined these mechanisms to see if you
>> can share implementations?
>> - To my knowledge, there is nothing that prevents an LLVM optimization pass
>> from manufacturing new pointers which point inside an existing data
>> structure.  (e.g. an interior pointer to an array when blocking a loop)
>> Does your StackMap mechanism need to be able to inspect/modify these
>> manufactured temporaries?  If so, I don't see how you could generate an
>> intrinsic which would include this manufactured pointer in the live variable
>> list.  Is there something I'm missing here?
>> 
>> These stackmaps have nothing to do with GC.  Interior pointers are a problem
>> unique to precise copying collectors.
>> 
>> I would argue that while the use of the stack maps might be different, the
>> mechanism is fairly similar.
>> 
>> 
>> It's not at all similar.  These stackmaps are only useful for
>> deoptimization, since the only way to make use of the live state information
>> is to patch the stackmap with a jump to a deoptimization off-ramp.  You
>> won't use these for a GC.
>> 
>> In general, if the expected semantics are the same, a shared implementation
>> would be desirable.  This is more a suggestion for future refactoring than
>> anything else.
>> 
>> 
>> I think that these stackmaps and GC stackmaps are fairly different beasts.
>> While it's possible to unify the two, this isn't the intent here.  In
>> particular, you can use these stackmaps for deoptimization without having to
>> unwind the stack.
>> 
>> 
>> I think Philip R is asking a good question. To paraphrase: If we introduce a
>> generically named feature, shouldn’t it be generically useful? Stack maps
>> are used in other ways, and there are other kinds of patching. I agree and I
>> think these are intended to be generically useful features, but not
>> necessarily sufficient for every use.
>> 
>> Thank you for the restatement.  You summarized my view well.
>> 
>> 
>> The proposed stack maps are very different from LLVM’s gcroot because gcroot
>> does not provide stack maps! llvm.gcroot effectively designates a stack
>> location for each root for the duration of the current function, and forces
>> the root to be spilled to the stack at all call sites (the client needs to
>> disable StackColoring). This is really the opposite of a stack map and I’m
>> not aware of any functionality that can be shared. It also requires a C++
>> plugin to process the roots. llvm.stackmap generates data in a section that
>> MCJIT clients can parse.
>> 
>> Er, I think we're talking past each other again.  Let me lay out my current
>> understanding of the terminology and existing infrastructure in LLVM.
>> Please correct me where I go wrong.
>> 
>> stack map - A mapping from "values" to storage locations.  Storage locations
>> primarily take the form of register, or stack offsets, but could in
>> principal refer to other well known locations (i.e. offsets into thread
>> local state).  A stack map is specific to a particular PC and describes the
>> state at that instruction only.
>> 
>> In a precise garbage collector, stack maps are used to ensure that the stack
>> can be understood by the collector.  When a stop-the-world safepoint is
>> reached, the collector needs to be able to identify any pointers to heap
>> objects which may exist on the stack.  This explicitly includes both the
>> frame which actually contains the safepoint and any caller frames back to
>> the root of thread.  To accomplish this, a stack map is generated at any
>> call site and a stack map is generated for the safepoint itself.
>> 
>> In LLVM currently, the GCStrategy records "safepoints" which are really
>> points at which stack maps need to be remembered.  (i.e. calls and actual
>> stop-the-world safepoints)  The GCMetadata mechanism gives a generic way to
>> emit the binary encoding of a stack map in a collector specific way.  The
>> current stack maps supported by this mechanism only allow abstract locations
>> on the stack which force all registers to be spilled around "safepoints"
>> (i.e. calls and stop-the-world safepoints).  Also, the set of roots (which
>> are recorded in the stack map) must be provided separately using the gcroot
>> intrinsic.
>> 
>> In code:
>> - GCPoint in llvm/include/llvm/CodeGen/GCMetadata.h describes a request for
>> a location with a stack map.  The SafePoints structure in GCFunctionInfo
>> contains a list of these locations.
>> - The Ocaml GC is probably the best example of usage.  See
>> llvm/lib/CodeGen/AsmPrinter/OcamlGCPrinter.cpp
>> 
>> Note: The summary of existing LLVM details above is based on reading the
>> code.  I haven't actually implemented anything which used this mechanism
>> yet.  As such, take it with a grain of salt.
>> 
>> 
>> That's an excellent description of stack maps, GCStrategy, and
>> safepoints. Now let me explain how I see it.
>> 
>> GCStrategy provides layers of abstraction that allow plugins to
>> specialize GC metadata. Conceptually, a plugin can generate what looks
>> like stack map data to the collector. But there isn't any direct
>> support in LLVM IR for the kind of stack maps that we need.
>> 
>> When I talk about adding stack map support, I'm really talking about
>> support for mapping values to registers, where the set of values and
>> their locations are specific to the "safepoint".
>> 
>> We're adding an underlying implementation of per-safepoint live
>> values. There isn't a lot of abstraction built up around it. Just a
>> couple of intrinsics that directly expose the functionality.
>> 
>> We're also approaching the interface very differently. We're enabling
>> an MCJIT client. The interface to the client is the stack map format.
>> 
>> 
>> In your change, you are adding a mechanism which is intended to enable
>> runtime calls and inline cache patching.  (Right?)  Your stack maps seem to
>> match the definition of a stack map I gave above and (I believe) the
>> implementation currently in LLVM.  The only difference might be that your
>> stack maps are partial (i.e. might not contain all "values" which are live
>> at a particular PC) and your implementation includes Register locations
>> which the current implementation in LLVM does not.  One other possible
>> difference, are you intending to include "values" which aren't of pointer
>> type?
>> 
>> 
>> Yes, the values will be of various types (although only 32/64 bit
>> types are currently allowed because of DWARF register number
>> weirdness). More importantly, our stack maps record locations of a
>> specific set of values, which may be in registers, at a specific
>> location. In fact, that, along with reserving space for code patching,
>> is *all* we're doing. GCRoot doesn't do this at all. So there is
>> effectively no overlap in implementation.
>> 
>> 
>> Before moving on, am I interpreting your proposal and changes correctly?
>> 
>> 
>> Yes, except I don’t see a direct connection between the functionality we’re
>> adding and “the implementation currently in LLVM”.
>> 
>> Assuming I'm still correct so far, how might we combine these
>> implementations?  It looks like your implementation is much more mature than
>> what exists in tree at the moment.  One possibility would be to express the
>> needed GC stack maps in terms of your new infrastructure.  (i.e. convert a
>> GCStrategy request for a safepoint into a StackMap (as you've implemented
>> it) with the list of explicit GC roots as it's arguments).  What would you
>> think of this?
>> 
>> 
>> I can imagine someone wanting to leverage some of the new
>> implementation without using it end-to-end as-is. Although I'm not
>> entirely sure what the motivation would be. For example:
>> 
>> - A CodeGenPrepare pass could insert llvm.safepoint or llvm.patchpoint
>>  calls at custom safepoints after determining GC root liveness at
>>  those points.
>> 
>> - Something like a GCStrategy could intercept our implementation of
>>  stack map generation and emit a custom format. Keep in mind though
>>  that the format that LLVM emits does not need to be the format read
>>  by the collector. The JIT/runtime can parse LLVM's stack map data
>>  and encode it using it's own data structures. That way, the
>>  JIT/runtime can change without customizing LLVM.
>> 
>> As far as hooking the new stack map support into the GCMetaData
>> abstraction, I'm not sure how that would work. GCMachineCodeAnalysis
>> is currently a standalone MI pass. We can't generate our stack maps
>> here. Technically, a preEmitPass can come along later and reassign
>> registers invalidating the stack map. That's why we generate the maps
>> during MC lowering.
>> 
>> So, currently, the new intrinsics are serving a different purpose than
>> GCMetaData. I think someone working on GC support needs to be
>> convinced that they really need the new stack map features. Then we
>> can build something on top of the underlying functionality that works
>> for them.
>> 
>> -Andy
> 
> 
> 
> -- 
> -------------------------------------------------------------------
> Gaël Thomas, Associate Professor, UPMC
> http://pagesperso-systeme.lip6.fr/Gael.Thomas/
> -------------------------------------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131023/b132dd68/attachment.html>


More information about the llvm-dev mailing list