[LLVMdev] GC StackMaps (was Stackmap and Patchpoint Intrinsic Proposal)

Wed Oct 23 08:38:14 PDT 2013

> On Oct 23, 2013, at 1:17 AM, Gaël Thomas <gael.thomas at lip6.fr> wrote:
> 
> Hi all,
> 
> I don't know if I understand everything, but it seems really
> interesting for a runtime developer, stackmap and patchpoint looks
> perfect for a lot of optimizations :) I just have few question to
> verify if I understand what are these stackmaps and patchpoints, and I
> discuss the GC after.
> 
> * I have a first very simple scenario (useful in vmkit). Let's imagine
> that we want to lazily build the layout of an object at runtime, i.e.,
> we don't know the layout of the object when we are emitting the code.
> And, we want to access to a field of this object identified by a
> symbol. If I understand correctly, we can use your stackmap to define
> the offset of this field and then patch the code that use this offset?
> The machine code will like  mov %(rax)offset, .., and the stackmap
> will generate a map that contains the location of "offset" in the
> code? If it's the case, it's perfect.

This is one of the use cases of patchpoint. Stackmap doesn't quite work because in IR it doesn't return anything - though you could probably use stackmap for putfield. But patchpoint is more convenient for this, I think. 

You'll probably want a wider range of return types of patchpoint. Currently it's just i64. 

> 
> * Now, let's imagine that I want to lazily call a virtual method (aka,
> a single dispatch call in Java). I have two problems. First, I have to
> know the offset of the method in the virtual table (just like for
> virtual calls in c++). Second, the entry in the table should contain a
> stub able to link/compile the target method at runtime. And the stub
> has to know which object has received the call (which drives the
> method resolution and the update of the virtual table). With stackmaps
> and patchpoints, I can imagine something like that (in pseudo-llvm
> without typing)
> 
> %r0 = load %obj, 0 ; the virtual table is at offset 0
> %r1 = 0
> stackmap %r1, ID_OFFSET ; contains the offset of the target method in
> the virtual table
> %r2 = add %r1, %r0
> %r3 = load %r2
> patchpoint ID_CALL %r3, %obj, other parameters ; to find %obj in the stub
> 
> I should be able to:
> - patch ID_OFFSET when I load the description of obj (before the call,
> when the object is allocated)
> - use ID_CALL to know which object is the target of the call in order
> to find the appropriate method. If it could be the case, your
> patchpoint and safepoint are very interesting for vmkit. We just need
> a function to retreive, in the caller, from which patchpoint we are
> coming from (probably by inspecting the stack, we can find the program
> pointer of the call site and then find the patchpoint descriptor?)

I think that you'd instead want the whole call and the vtable resolution to be machine code that you generate inside the patchpoint. 

> 
> * Now, for the GC, if I understand correctly, instead of declaring a
> variable as a root, you can declare explicitly the safepoints by using
> patchpoints with something like
> 
> patchpoint ID_safepoint_17, suspendTheThreadForCollection, list of the
> alloca (or registers) that contains objects
> 
> Then in the suspendTheThreadForCollection, we can see that we are
> coming for the safepoint_17 and then find the locations of the
> objects? If a patchpoint can work like this, it's probably a good
> building block for the gc.
> 
> Currently, we have to declare the root objects with the root
> intrinsic, then add the appropriate safepoints (it's just a call to
> GCFunctionInfo.addSafePoint). As root objects are marked as root,
> modifying GCFunctionInfo.addSafepoint to generate a patchpoint with
> all the gc roots as argument (instead of using the current
> infrastructure) should not be difficult. And it probably means that
> the current gc infrastructure could use patchpoint as a backend. The
> only problem that I see is that all the objects will be transmitted as
> arguments to suspendTheThreadForCollection, it's maybe not the best
> way to do that. Probably, something like:
> safepoint ID_safepoint_17, list of alloca that contains objects
> patchpoint ID_safepoint_17, suspendTheThreadForCollection
> should be better to avoid useless arguments?
> 
> See you,
> Gaël
> 
> PS: just, tell me if the code is already in the trunk, because I would
> like to see if these intrinsics can work for vmkit :)
> 
> 
> 2013/10/23 Andrew Trick <atrick at apple.com>:
>> I'm moving this to a different thread. I think the newly proposed
>> intrinsic definitions and their current implementation are valuable
>> regardless of how it gets tied into GC...
>> 
>> On Oct 22, 2013, at 6:24 PM, Philip R <listmail at philipreames.com> wrote:
>> 
>> Adding Gael as someone who has previously discussed vmkit topics on the
>> list.  Since I'm assuming this is where the GC support came from, I wanted
>> to draw this conversation to the attention of someone more familiar with the
>> LLVM implementation than myself.
>> 
>> On 10/22/13 4:18 PM, Andrew Trick wrote:
>> 
>> On Oct 22, 2013, at 3:08 PM, Filip Pizlo <fpizlo at apple.com> wrote:
>> 
>> On Oct 22, 2013, at 1:48 PM, Philip R <listmail at philipreames.com> wrote:
>> 
>> On 10/22/13 10:34 AM, Filip Pizlo wrote:
>> 
>> On Oct 22, 2013, at 9:53 AM, Philip R <listmail at philipreames.com> wrote:
>> 
>> On 10/17/13 10:39 PM, Andrew Trick wrote:
>> 
>> This is a proposal for adding Stackmaps and Patchpoints to LLVM. The
>> first client of these features is the JavaScript compiler within the
>> open source WebKit project.
>> 
>> I have a couple of comments on your proposal.  None of these are major
>> enough to prevent submission.
>> 
>> - As others have said, I'd prefer an experimental namespace rather than a
>> webkit namespace.  (minor)
>> - Unless I am misreading your proposal, your proposed StackMap intrinsic
>> duplicates existing functionality already in llvm.  In particular, much of
>> the StackMap construction seems similar to the Safepoint mechanism used by
>> the in-tree GC support. (See CodeGen/GCStrategy.cpp and
>> CodeGen/GCMetadata.cpp).  Have you examined these mechanisms to see if you
>> can share implementations?
>> - To my knowledge, there is nothing that prevents an LLVM optimization pass
>> from manufacturing new pointers which point inside an existing data
>> structure.  (e.g. an interior pointer to an array when blocking a loop)
>> Does your StackMap mechanism need to be able to inspect/modify these
>> manufactured temporaries?  If so, I don't see how you could generate an
>> intrinsic which would include this manufactured pointer in the live variable
>> list.  Is there something I'm missing here?
>> 
>> These stackmaps have nothing to do with GC.  Interior pointers are a problem
>> unique to precise copying collectors.
>> 
>> I would argue that while the use of the stack maps might be different, the
>> mechanism is fairly similar.
>> 
>> 
>> It's not at all similar.  These stackmaps are only useful for
>> deoptimization, since the only way to make use of the live state information
>> is to patch the stackmap with a jump to a deoptimization off-ramp.  You
>> won't use these for a GC.
>> 
>> In general, if the expected semantics are the same, a shared implementation
>> would be desirable.  This is more a suggestion for future refactoring than
>> anything else.
>> 
>> 
>> I think that these stackmaps and GC stackmaps are fairly different beasts.
>> While it's possible to unify the two, this isn't the intent here.  In
>> particular, you can use these stackmaps for deoptimization without having to
>> unwind the stack.
>> 
>> 
>> I think Philip R is asking a good question. To paraphrase: If we introduce a
>> generically named feature, shouldn’t it be generically useful? Stack maps
>> are used in other ways, and there are other kinds of patching. I agree and I
>> think these are intended to be generically useful features, but not
>> necessarily sufficient for every use.
>> 
>> Thank you for the restatement.  You summarized my view well.
>> 
>> 
>> The proposed stack maps are very different from LLVM’s gcroot because gcroot
>> does not provide stack maps! llvm.gcroot effectively designates a stack
>> location for each root for the duration of the current function, and forces
>> the root to be spilled to the stack at all call sites (the client needs to
>> disable StackColoring). This is really the opposite of a stack map and I’m
>> not aware of any functionality that can be shared. It also requires a C++
>> plugin to process the roots. llvm.stackmap generates data in a section that
>> MCJIT clients can parse.
>> 
>> Er, I think we're talking past each other again.  Let me lay out my current
>> understanding of the terminology and existing infrastructure in LLVM.
>> Please correct me where I go wrong.
>> 
>> stack map - A mapping from "values" to storage locations.  Storage locations
>> primarily take the form of register, or stack offsets, but could in
>> principal refer to other well known locations (i.e. offsets into thread
>> local state).  A stack map is specific to a particular PC and describes the
>> state at that instruction only.
>> 
>> In a precise garbage collector, stack maps are used to ensure that the stack
>> can be understood by the collector.  When a stop-the-world safepoint is
>> reached, the collector needs to be able to identify any pointers to heap
>> objects which may exist on the stack.  This explicitly includes both the
>> frame which actually contains the safepoint and any caller frames back to
>> the root of thread.  To accomplish this, a stack map is generated at any
>> call site and a stack map is generated for the safepoint itself.
>> 
>> In LLVM currently, the GCStrategy records "safepoints" which are really
>> points at which stack maps need to be remembered.  (i.e. calls and actual
>> stop-the-world safepoints)  The GCMetadata mechanism gives a generic way to
>> emit the binary encoding of a stack map in a collector specific way.  The
>> current stack maps supported by this mechanism only allow abstract locations
>> on the stack which force all registers to be spilled around "safepoints"
>> (i.e. calls and stop-the-world safepoints).  Also, the set of roots (which
>> are recorded in the stack map) must be provided separately using the gcroot
>> intrinsic.
>> 
>> In code:
>> - GCPoint in llvm/include/llvm/CodeGen/GCMetadata.h describes a request for
>> a location with a stack map.  The SafePoints structure in GCFunctionInfo
>> contains a list of these locations.
>> - The Ocaml GC is probably the best example of usage.  See
>> llvm/lib/CodeGen/AsmPrinter/OcamlGCPrinter.cpp
>> 
>> Note: The summary of existing LLVM details above is based on reading the
>> code.  I haven't actually implemented anything which used this mechanism
>> yet.  As such, take it with a grain of salt.
>> 
>> 
>> That's an excellent description of stack maps, GCStrategy, and
>> safepoints. Now let me explain how I see it.
>> 
>> GCStrategy provides layers of abstraction that allow plugins to
>> specialize GC metadata. Conceptually, a plugin can generate what looks
>> like stack map data to the collector. But there isn't any direct
>> support in LLVM IR for the kind of stack maps that we need.
>> 
>> When I talk about adding stack map support, I'm really talking about
>> support for mapping values to registers, where the set of values and
>> their locations are specific to the "safepoint".
>> 
>> We're adding an underlying implementation of per-safepoint live
>> values. There isn't a lot of abstraction built up around it. Just a
>> couple of intrinsics that directly expose the functionality.
>> 
>> We're also approaching the interface very differently. We're enabling
>> an MCJIT client. The interface to the client is the stack map format.
>> 
>> 
>> In your change, you are adding a mechanism which is intended to enable
>> runtime calls and inline cache patching.  (Right?)  Your stack maps seem to
>> match the definition of a stack map I gave above and (I believe) the
>> implementation currently in LLVM.  The only difference might be that your
>> stack maps are partial (i.e. might not contain all "values" which are live
>> at a particular PC) and your implementation includes Register locations
>> which the current implementation in LLVM does not.  One other possible
>> difference, are you intending to include "values" which aren't of pointer
>> type?
>> 
>> 
>> Yes, the values will be of various types (although only 32/64 bit
>> types are currently allowed because of DWARF register number
>> weirdness). More importantly, our stack maps record locations of a
>> specific set of values, which may be in registers, at a specific
>> location. In fact, that, along with reserving space for code patching,
>> is *all* we're doing. GCRoot doesn't do this at all. So there is
>> effectively no overlap in implementation.
>> 
>> 
>> Before moving on, am I interpreting your proposal and changes correctly?
>> 
>> 
>> Yes, except I don’t see a direct connection between the functionality we’re
>> adding and “the implementation currently in LLVM”.
>> 
>> Assuming I'm still correct so far, how might we combine these
>> implementations?  It looks like your implementation is much more mature than
>> what exists in tree at the moment.  One possibility would be to express the
>> needed GC stack maps in terms of your new infrastructure.  (i.e. convert a
>> GCStrategy request for a safepoint into a StackMap (as you've implemented
>> it) with the list of explicit GC roots as it's arguments).  What would you
>> think of this?
>> 
>> 
>> I can imagine someone wanting to leverage some of the new
>> implementation without using it end-to-end as-is. Although I'm not
>> entirely sure what the motivation would be. For example:
>> 
>> - A CodeGenPrepare pass could insert llvm.safepoint or llvm.patchpoint
>>  calls at custom safepoints after determining GC root liveness at
>>  those points.
>> 
>> - Something like a GCStrategy could intercept our implementation of
>>  stack map generation and emit a custom format. Keep in mind though
>>  that the format that LLVM emits does not need to be the format read
>>  by the collector. The JIT/runtime can parse LLVM's stack map data
>>  and encode it using it's own data structures. That way, the
>>  JIT/runtime can change without customizing LLVM.
>> 
>> As far as hooking the new stack map support into the GCMetaData
>> abstraction, I'm not sure how that would work. GCMachineCodeAnalysis
>> is currently a standalone MI pass. We can't generate our stack maps
>> here. Technically, a preEmitPass can come along later and reassign
>> registers invalidating the stack map. That's why we generate the maps
>> during MC lowering.
>> 
>> So, currently, the new intrinsics are serving a different purpose than
>> GCMetaData. I think someone working on GC support needs to be
>> convinced that they really need the new stack map features. Then we
>> can build something on top of the underlying functionality that works
>> for them.
>> 
>> -Andy
> 
> 
> 
> -- 
> -------------------------------------------------------------------
> Gaël Thomas, Associate Professor, UPMC
> http://pagesperso-systeme.lip6.fr/Gael.Thomas/
> -------------------------------------------------------------------