<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 08/19/2015 01:52 AM, Hal Finkel
wrote:<br>
</div>
<blockquote
cite="mid:10242275.194.1439974361155.JavaMail.javamailuser@localhost"
type="cite">
<style type="text/css">p { margin: 0; }</style>
<div style="font-family: arial,helvetica,sans-serif; font-size:
10pt; color: #000000"><br>
<hr id="zwchr">
<blockquote style="border-left: 2px solid rgb(16, 16, 255);
margin-left: 5px; padding-left: 5px; color: rgb(0, 0, 0);
font-weight: normal; font-style: normal; text-decoration:
none; font-family: Helvetica,Arial,sans-serif; font-size:
12pt;"><b>From: </b>"David Majnemer"
<a class="moz-txt-link-rfc2396E" href="mailto:david.majnemer@gmail.com"><david.majnemer@gmail.com></a><br>
<b>To: </b>"Sanjoy Das"
<a class="moz-txt-link-rfc2396E" href="mailto:sanjoy@playingwithpointers.com"><sanjoy@playingwithpointers.com></a><br>
<b>Cc: </b>"llvm-dev" <a class="moz-txt-link-rfc2396E" href="mailto:llvm-dev@lists.llvm.org"><llvm-dev@lists.llvm.org></a>,
"Philip Reames" <a class="moz-txt-link-rfc2396E" href="mailto:listmail@philipreames.com"><listmail@philipreames.com></a>, "Chandler
Carruth" <a class="moz-txt-link-rfc2396E" href="mailto:chandlerc@gmail.com"><chandlerc@gmail.com></a>, "Nick Lewycky"
<a class="moz-txt-link-rfc2396E" href="mailto:nlewycky@google.com"><nlewycky@google.com></a>, "Hal Finkel"
<a class="moz-txt-link-rfc2396E" href="mailto:hfinkel@anl.gov"><hfinkel@anl.gov></a>, "Chen Li" <a class="moz-txt-link-rfc2396E" href="mailto:meloli87@gmail.com"><meloli87@gmail.com></a>,
"Russell Hadley" <a class="moz-txt-link-rfc2396E" href="mailto:rhadley@microsoft.com"><rhadley@microsoft.com></a>, "Kevin
Modzelewski" <a class="moz-txt-link-rfc2396E" href="mailto:kmod@dropbox.com"><kmod@dropbox.com></a>, "Swaroop Sridhar"
<a class="moz-txt-link-rfc2396E" href="mailto:Swaroop.Sridhar@microsoft.com"><Swaroop.Sridhar@microsoft.com></a>, <a class="moz-txt-link-abbreviated" href="mailto:rudi@dropbox.com">rudi@dropbox.com</a>, "Pat
Gavlin" <a class="moz-txt-link-rfc2396E" href="mailto:pagavlin@microsoft.com"><pagavlin@microsoft.com></a>, "Joseph Tremoulet"
<a class="moz-txt-link-rfc2396E" href="mailto:jotrem@microsoft.com"><jotrem@microsoft.com></a>, "Reid Kleckner"
<a class="moz-txt-link-rfc2396E" href="mailto:rnk@google.com"><rnk@google.com></a><br>
<b>Sent: </b>Monday, August 10, 2015 11:38:32 PM<br>
<b>Subject: </b>Re: RFC: Add "operand bundles" to calls and
invokes<br>
<br>
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Sun, Aug 9, 2015 at 11:32 PM,
Sanjoy Das <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:sanjoy@playingwithpointers.com"
target="_blank">sanjoy@playingwithpointers.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt
0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204);
padding-left: 1ex;">We'd like to propose a scheme to
attach "operand bundles" to call and<br>
invoke instructions. This is based on the offline
discussion<br>
mentioned in<br>
<a moz-do-not-send="true"
href="http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-July/088748.html"
rel="noreferrer" target="_blank">http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-July/088748.html</a>.<br>
<br>
# Motivation & Definition<br>
<br>
Our motivation behind this is to track the state
required for<br>
deoptimization (described briefly later) through the
LLVM pipeline as<br>
a first-class IR citizen. We want to do this is a way
that is<br>
generally useful.<br>
<br>
An "operand bundle" is a set of SSA values (called
"bundle operands")<br>
tagged with a string (called the "bundle tag"). One
or more of such<br>
bundles may be attached to a call or an invoke. The
intended use of<br>
these values is to support "frame introspection"-like
functionality<br>
for managed languages.<br>
<br>
<br>
# Abstract Syntax<br>
<br>
The syntax of a call instruction will be changed to
look like this:<br>
<br>
<result> = [tail | musttail] call [cconv] [ret
attrs] <ty> [<fnty>*]<br>
<fnptrval>(<function args>)
[operand_bundle*] [fn attrs]<br>
<br>
where operand_bundle = tag '('[ value ] (',' value )*
')'<br>
value = normal SSA values<br>
tag = "< some name >"<br>
<br>
In other words, after the function arguments we now
have an optional<br>
list of operand bundles of the form `"< bundle tag
>"(bundle<br>
attributes, values...)`. There can be more than one
operand bundle in<br>
a call. Two operand bundles in the same call
instruction cannot have<br>
the same tag.<br>
<br>
We'd do something similar for invokes. I'll omit the
invoke syntax<br>
from this RFC to keep things brief.<br>
<br>
An example:<br>
<br>
define i32 @f(i32 %x) {<br>
entry:<br>
%t = add i32 %x, 1<br>
ret i32 %t<br>
}<br>
<br>
define void @g(i16 %val, i8* %ptr) {<br>
entry:<br>
call void @f(i32 10) "some-bundle"(i32 42)
"debug"(i32 100)<br>
call void @f(i32 20) "some-bundle"(i16 %val, i8*
%ptr)<br>
}<br>
<br>
Note 1: Operand bundles are *not* part of a function's
signature, and<br>
a given function may be called from multiple places
with different<br>
kinds of operand bundles. This reflects the fact that
the operand<br>
bundles are conceptually a part of the *call*, not the
callee being<br>
dispatched to.<br>
<br>
Note 2: There may be tag specific requirements not
mentioned here.<br>
E.g. we may add a rule in the future that says operand
bundles with<br>
the tag `"integer-id"` may only contain exactly one
constant integer.<br>
<br>
<br>
# IR Semantics<br>
<br>
Bundle operands (SSA values part of some operand
bundle) are normal<br>
SSA values. They need to dominate the call or invoke
instruction<br>
they're being passed into and can be optimized as
usual. For<br>
instance, LLVM is allowed (and strongly encouraged!)
to PRE / LICM a<br>
load feeding into an operand bundle if legal.<br>
<br>
Operand bundles are characterized by the `"< bundle
tag >"` string<br>
associated with them.<br>
<br>
The overall strategy is:<br>
<br>
1. The semantics are as conservative as is reasonable
for operand<br>
bundles with tags that LLVM does not have a
special understanding<br>
of. This way LLVM does not miscompile code by
default.<br>
<br>
2. LLVM understands the semantics of operand bundles
with certain<br>
specific tags more precisely, and can optimize
them better.<br>
<br>
This RFC talks mainly about (1). We will discuss (2)
as we add smarts<br>
to LLVM about specific kinds of operand bundles.<br>
<br>
The IR-level semantics of an operand bundle with an
arbitrary tag are:<br>
<br>
1. The bundle operands passed in to a call escape in
unknown ways<br>
before transferring control to the callee. For
instance:<br>
<br>
declare void @opaque_runtime_fn()<br>
<br>
define void @f(i32* %v) { }<br>
<br>
define i32 @g() {<br>
%t = i32* @malloc(...)<br>
;; "unknown" is a tag LLVM does not have any
special knowledge of<br>
call void @f(i32* %t) "unknown"(i32* %t)<br>
<br>
store i32 42, i32* %t<br>
call void @opaque_runtime_fn();<br>
ret (load i32, i32* %t)<br>
}<br>
<br>
Normally (without the `"unknown"` bundle) it would
be okay to<br>
optimize `@g` to return `42`. But the `"unknown"`
operand bundle<br>
escapes `%t`, and the call to `@opaque_runtime_fn`
can therefore<br>
modify the location pointed to by `%t`.<br>
<br>
2. Calls and invokes with operand bundles have
unknown read / write<br>
effect on the heap on entry and exit (even if the
call target is<br>
`readnone` or `readonly`). For instance:<br>
<br>
define void @f(i32* %v) { }<br>
<br>
define i32 @g() {<br>
%t = i32* @malloc(...)<br>
%t.unescaped = i32* @malloc(...)<br>
;; "unknown" is a tag LLVM does not have any
special knowledge of<br>
call void @f(i32* %t) "unknown"(i32* %t)<br>
ret (load i32, i32* %t)<br>
}<br>
<br>
Normally it would be okay to optimize `@g` to
return `undef`, but<br>
the `"unknown"` bundle potentially clobbers `%t`.
Note that it<br>
clobbers `%t` only because it was *also escaped*
by the<br>
`"unknown"` operand bundle -- it does not clobber
`%t.unescaped`<br>
because it isn't reachable from the heap yet.<br>
<br>
However, it is okay to optimize<br>
<br>
define void @f(i32* %v) {<br>
store i32 10, i32* %v<br>
print(load i32, i32* %v)<br>
}<br>
<br>
define void @g() {<br>
%t = ...<br>
;; "unknown" is a tag LLVM does not have any
special knowledge of<br>
call void @f(i32* %t) "unknown"()<br>
}<br>
<br>
to<br>
<br>
define void @f(i32* %v) {<br>
store i32 10, i32* %v<br>
print(10)<br>
}<br>
<br>
define void @g() {<br>
%t = ...<br>
call void @f(i32* %t) "unknown"()<br>
}<br>
<br>
The arbitrary heap clobbering only happens on the
boundaries of<br>
the call operation, and therefore we can still do
store-load<br>
forwarding *within* `@f`.<br>
<br>
Since we haven't specified any "pure" LLVM way of
accessing the<br>
contents of operand bundles, the client is required to
model such<br>
accesses as calls to opaque functions (or inline
assembly). This<br>
ensures that things like IPSCCP work as intended.
E.g. it is legal to<br>
optimize<br>
<br>
define i32 @f(i32* %v) { ret i32 10 }<br>
<br>
define void @g() {<br>
%t = i32* @malloc(...)<br>
%v = call i32 @f(i32* %t) "unknown"(i32* %t)<br>
print(%v)<br>
}<br>
<br>
to<br>
<br>
define i32 @f(i32* %v) { ret i32 10 }<br>
<br>
define void @g() {<br>
%t = i32* @malloc(...)<br>
%v = call i32 @f(i32* %t) "unknown"(i32* %t)<br>
print(10)<br>
}<br>
<br>
LLVM won't generally be able to inline through calls
and invokes with<br>
operand bundles -- the inliner does not know what to
replace the<br>
arbitrary heap accesses implied on function entry and
exit with.<br>
However, we intend to teach the inliner to inline
through calls /<br>
invokes with some specific kinds of operand bundles.<br>
<br>
<br>
# Lowering<br>
<br>
The lowering strategy will be special cased for each
bundle tag.<br>
There won't be any "generic" lowering strategy --
`llc` is expected to<br>
abort if it sees an operand bundle that it does not
understand.<br>
<br>
There is no requirement that the operand bundles
actually make it to<br>
the backend. Rewriting the operand bundles into
"vanilla" LLVM IR at<br>
some point in the pipeline (instead of teaching
codegen to lower them)<br>
is a perfectly reasonable lowering strategy.<br>
<br>
<br>
# Example use cases<br>
<br>
A couple of usage scenarios are very briefly described
below:<br>
<br>
## Deoptimization<br>
<br>
This is our motivating use case. Some managed
environments expect to<br>
be able to discover the state of the abstract virtual
machine at specific call<br>
sites. LLVM will be able to support this requirement
by attaching a<br>
`"deopt"` operand bundle containing the state of the
abstract virtual<br>
machine (as a vector of SSA values) at the appropriate
call sites.<br>
There is a straightforward way<br>
to extend the inliner work with `"deopt"` operand
bundles.<br>
<br>
`"deopt"` operand bundles will not have to be as
pessimistic about<br>
heap effects as the general "unknown operand bundle"
case -- they only<br>
imply a read from the entire heap on function entry or
function exit,<br>
depending on what kind of deoptimization state we're
interested in.<br>
They also don't imply escaping semantics.<br>
<br>
<br>
## Value injection<br>
<br>
By passing in one or more `alloca`s to an
`"injectable-value"` tagged<br>
operand bundle, languages can allow the runtime to
overwrite the<br>
values of specific variables, while still preserving a
significant<br>
amount of optimization potential.<br>
<br>
<br>
<br>
Thoughts?<br>
</blockquote>
<div><br>
</div>
<div id="DWT3847">This seems pretty useful, generic,
call-site annotation mechanism. </div>
</div>
</div>
</div>
</blockquote>
<br>
Agreed. It seems like these would be useful for our existing
patchpoints too (to record the live values for the associated
stack map, instead of using extra intrinsic arguments for them).<br>
</div>
</blockquote>
That's specifically the intent. This mechanism will allow us to
work towards replacing (or at least greatly simplifying) both
patchpoint and statepoints.<br>
<blockquote
cite="mid:10242275.194.1439974361155.JavaMail.javamailuser@localhost"
type="cite">
<div style="font-family: arial,helvetica,sans-serif; font-size:
10pt; color: #000000"><br>
-Hal<br>
<blockquote style="border-left: 2px solid rgb(16, 16, 255);
margin-left: 5px; padding-left: 5px; color: rgb(0, 0, 0);
font-weight: normal; font-style: normal; text-decoration:
none; font-family: Helvetica,Arial,sans-serif; font-size:
12pt;">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> I believe that this has immediate application
outside of the context of GC.</div>
<div><br>
</div>
<div>Our exception handling personality routine has a
desire to know whether some code is inside a specific
try or catch. We can feed the value coming out of our
EH pad back into the call-site, making it very clear
which EH pad the call-site is associated with.</div>
<div> </div>
<blockquote class="gmail_quote" style="margin: 0pt 0pt
0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204);
padding-left: 1ex;">
<span class="HOEnZb"><font color="#888888">-- Sanjoy<br>
</font></span></blockquote>
</div>
<br>
</div>
</div>
</blockquote>
<br>
<br>
<br>
-- <br>
<div><span name="x"></span>Hal Finkel<br>
Assistant Computational Scientist<br>
Leadership Computing Facility<br>
Argonne National Laboratory<span name="x"></span><br>
</div>
</div>
</blockquote>
<br>
</body>
</html>