[LLVMdev] [RFC] Stackmap and Patchpoint Intrinsic Proposal

Thu Oct 17 22:39:36 PDT 2013

This is a proposal for adding Stackmaps and Patchpoints to LLVM. The
first client of these features is the JavaScript compiler within the
open source WebKit project.

A Stackmap is a record of variable locations (registers and stack
offsets) at a particular instruction address.

A Patchpoint is an instruction address at which space is reserved for
patching a new instruction sequence at runtime.

These two features are close friends because it wouldn't be possible
for a runtime to patch LLVM generated code without a stack map telling
it where relevant variables live. However, the proposed stackmaps are
useful without patchpoints. In fact, the typical use-case for
stackmaps is implementing a simple trap to the runtime.

Stackmaps are required for runtime traps because without them the
optimized code would be dominated by instructions for marshaling
values into fixed locations. Even if most of the extra code can be
sunk into cold paths, experiments have shown that the impact on
compile time and code size is enormous.

Explicitly branching to runtime traps handles many situations. But
there are also cases in which the runtime needs to patch the runtime
call or the surrounding code. There are two kinds of patching we need
to support. The first case involves trampling LLVM-generated code to
safely invalidate it. This case needs to have zero impact on
optimization and codegen aside from keeping some values live. A second
case involves dynamically optimizing a short code sequence, for
example, to implement a dynamic inline cache. In this case, the
commonly executed code must be a call-free fast path. Some runtime
events may require rewriting the check guarding the fast path (e.g. to
change a type ID) or even rewriting the code the accesses a field to
change the offset. Completely invalidating the code at these events is
undesirable.

Two proposed intrinsics, llvm.stackmap and llvm.patchpoint, solve all
of the problems outlined above. The LangRef doc section is included at
the end of this proposal. The LLVM implementation of the intrinsics is
quite straightforward as you can see from the patches that I'll be
sending to llvm-commits.

Both intrinsics can generate a stack map. The difference is that a
llvm.stackmap is just a stack map. It doesn't generate any
code. llvm.patchpoint always generates a call. The runtime may
overwrite that call into a dynamically optimized inline cache.

llvm.stackmap is simple. It takes an integer ID for easy reference by
the runtime and a list of live values. It can optionally be given a
number of "shadow" bytes. The shadow bytes may be set to nonzero to
ensure that the runtime can safely patch the code following the
stackmap. This is useful for invalidating compiled code by trapping at
arbitrary points.

The LLVM backend emits stackmaps in a special data section. This
design works for JITs that are confined to the LLVM C API. Each
intrinsic results in a stackmap record with the ID and offset from
function entry. Each record contains an entry for each live value with
its location encoded as a register or stack offset.

llvm.patchpoint is the fusion of a normal call and an
llvm.stackmap. It additionally takes a call target and specifies a
number of call arguments. The call target is an opaque value to LLVM
so the runtime is not required to provide a symbol. The calling
convention can be specified via the normal "cconv" marker on the call
instruction. Instead of casting a "shadow" where code can be patched
it reserves a block of encoding space where the call-to-target will be
initially emitted followed by nop padding.

Everything about the design and implementation of these intrinsic is
as generic as we can conceive at the time. I expect the next client
who wants to optimize their managed runtime to be able to do most if
not all of what they want with the existing design. In the meantime,
the open source WebKit project has already added optional support for
llvm.stackmaps and llvm.patchpoint will be in shortly.

The initial documentation and patches name these intrinsics in a
"webkit" namespace. This clarifies their current purpose and conveys
that they haven't been standardized for other JITs yet. If someone on
the on the dev list says "yes we want to use these too, just the way
they are", then we can just drop the "webkit" name. More likely, we
will continue improving their functionality for WebKit until some
point in the future when another JIT customer tells us they would like
to use the intrinsics but really want to change the interface. At that
point, we can review this again with the goal of standardization and
backward compatibility, then promote the name. WebKit is maintained
against LLVM trunk so can be quickly adjusted to a new interface. The
same may not be true of other JITs.

These are the proposed changes to LangRef, written by Juergen and me.

WebKit Intrinsics
-----------------

This class of intrinsics is used by the WebKit JavaScript compiler to obtain
additional information about the live state of certain variables and/or to
enable the runtime system / JIT to patch the code afterwards.

The use of the following intrinsics always generates a stack map. The purpose
of a stack map is to record the location of function arguments and live
variables at the point of the intrinsic function in the instruction steam.
Furthermore it records a unique callsite id and the offset from the beginning
of the enclosing function.

'``llvm.webkit.stackmap``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:
"""""""

::

      declare void (i32, i32, ...)* @llvm.webkit.stackmap(i32 <id>, i32 <numShadowBytes>, ...)

Overview:
"""""""""

The '``llvm.webkit.stackmap``' intrinsic records the location of live variables in the stack map without generating any code.

Arguments:
""""""""""

The first argument is a unique id and the second argument is the number of
shadow bytes following the intrinsic. The variable number of arguments after
that are the live variables.

Semantics:
""""""""""

The stackmap intrinsic generates no code in place, but its offset from function
entry is stored in the stack map. Furthermore, it guarantees a shadow of
instructions following its instruction offset during which neither the end of
the function nor another stackmap or patchpoint intrinsic may occur. This allows the runtime to patch the code at this point in response to an event triggered from outside the code.

'``llvm.webkit.patchpoint.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:
"""""""

::

      declare void (i32, i32, i8*, i32, ...)* @llvm.webkit.patchpoint.void(i32 <id>, i32 <numBytes>, i8* <target>, i32 <numArgs>, ...)
      declare i64 (i32, i32, i8*, i32, ...)* @llvm.webkit.patchpoint.i64(i32 <id>, i32 <numBytes>, i8* <target>, i32 <numArgs>, ...)

Overview:
"""""""""

The '``llvm.webkit.patchpoint.*``' intrinsics creates a function call to the
specified <target> and records the location of the live variables in the stack
map.

Arguments:
""""""""""

The first argument is a unique id, the second is the number of bytes
reserved for encoding the intrinsic, the third is the target address
of a function, and the fourth specifies how many of the following
variable arguments participate in the function call. The remaining
variable number of arguments are the live variables.

Semantics:
""""""""""

The patchpoint intrinsic generates the stack map and emits a function call to the address specified by <target>. The function call and its arguments are lowered according to the calling convention specified at the callsite of the intrinsic. The location of the arguments are not normally recorded in the stack map. However, special stack-map specific calling conventions can force the argument locations to be recorded. The patchpoint also emits nops to cover at least <numBytes> of instruction encoding space. Hence, the client must ensure that <numBytes> is enough to encode a call to the target address on the supported targets. The runtime may patch the code emitted for the patchpoint, including the call instruction and nops.