[llvm-dev] RFC: Add "operand bundles" to calls and invokes

Tue Aug 18 10:20:03 PDT 2015

On 08/12/2015 02:58 PM, Sanjoy Das wrote:
>>>    2. Calls and invokes with operand bundles have unknown read / write
>>>       effect on the heap on entry and exit (even if the call target is
>>>       `readnone` or `readonly`).  For instance:
>> I don't think we actually need this.  I think it would be perfectly fine to
>> require the frontend ensure that the called function is not readonly if it
>> being readonly would be problematic for the call site.  I'm not really
>> opposed to this generalization - I could see it being useful - but I'm
>> worried about the amount of work involved.  A *lot* of the optimizer assumes
>> that attributes on a call site strictly less conservative than the
>> underlying function. Changing that could have a long bug tail.  I'd rather
>> defer that work until someone defines an operand bundle type which requires
>> it.  The motivating example (deoptimization) doesn't seem to require this.
> If we're doing late poll placement and if certain functions are
> "frameless" in the abstract machine, then we will need this for
> deoptimization.
>
> The case I'm thinking of is:
>
>    define void @foo() {
>     ;; Can be just about any kind of uncounted loop that is readnone
>     entry:
>      br label %inf_loop
>
>     inf_loop:
>      br label %inf_loop
>    }
>
>    define void @caller() {
>     entry:
>      store i32 42, i32* @global
>      call void @foo() "deopt"(i32 100)
>      store i32 46, i32* @global
>      ret void
>    }
>
> Right now `@foo` is `readnone`, so the first store of `i32 42` can be
> DSE'ed.  However, if we insert a poll inside `@foo` later, that will
> have to be given a JVM state, which we cannot do anymore since a store
> that would have been done by the abstract machine has been elided.
As we discussed offline, this example is invalid.  Specifically, while 
late insertion of safepoints works just fine for garbage collection, 
there is no way to rematerialize the abstract state for a virtual 
frame.  As a result, late insertion of deoptimization points is an 
unsolved problem.  Instead, this example would have had to have had at 
least one deopt point (which reads the entire heap) in @foo.  As a 
result, @foo must be at least readonly and the DSE can not happen.
>
> [ moved here, because this is related ]
>>> `"deopt"` operand bundles will not have to be as pessimistic about
>>> heap effects as the general "unknown operand bundle" case -- they only
>>> imply a read from the entire heap on function entry or function exit,
>>> depending on what kind of deoptimization state we're interested in.
>>> They also don't imply escaping semantics.
>> An alternate framing here which would remove the attribute case I was
>> worried about about would be to separate the memory and abstract state
>> semantics of deoptimization.  If the deopt bundle only described the
>> abstract state and it was up to the frontend to ensure the callee was at
>> least readonly, we wouldn't need to model memory in the deopt bundle.  I
>> think that's a much better starting place.
> Semantically, I think we need the state of the heap to be consistent
> at method call boundaries, not within a method boundary.  For
> instance, consider this:
>
>    ;; @global is 0 to start with
>
>    define void @f() readonly {
>      ;; do whatever
>      call read_only_safepoint_poll() readonly "deopt"( ... deopt state
> local to @f ...)
>    }
>
>    define void @g() {
>      call void @f() "deopt"( ... deopt state local to @g ...)
>      if (*@global == 42) { side_effect(); }
>      store i32 42, i32* @global
>    }
>
> If we do not have the reads-everything-on-exit property, then this is
> a valid transform:
>
>    define void @f() readonly {
>      ;; do whatever
>      call read_only_safepoint_poll() readonly "deopt"( ... deopt state
> local to @f ...)
>      if (*@global == 42) { side_effect(); }
>      store i32 42, i32* @global
>    }
>
>    define void @g() {
>      call void @f() "deopt"( ... deopt state local to @g ...)
>    }
>
> If we *don't* inline `@f` into `@g`, and `@f` wants to deoptimize `@g`
> (and only `@g`) after halting the thread at
> `read_only_safepoint_poll`, we're in trouble.  `@f` will execute the
> store to `@global` before returning, and the deoptimized `@g` will
> call `side_effect` when it shouldn't have.  (Note: I put the `if
> (*@global == 42)` to make the problem more obvious, but in practice I
> think doing the same store twice is also problematic).  Another way to
> state this is that even though the state of the heap was consistent at
> the call to `read_only_safepoint_poll`, it will not be consistent when
> `@f` returns.  Therefore we cannot use a "deopt `@g` on return with
> vmstate xyz" scheme, unless we model the operand bundle as reading the
> entire heap on return of `@f` (this would force the state of the heap
> to be consistent at the point where we actually use the vmstate).
Sanjoy had to explain this example to me offline, so let me try to 
summarize for other readers.  The key issue here is that deoptimization 
is a very restricted form of function interposition. Specifically, if we 
have physical frames for both @f and @g created by compiled versions of 
each function, we can replace the frame @g with a new frame @g_int which 
resumes execution in the interpreter at the specified abstract VM 
state.  Essentially, it's not safe to assume that the version of the 
code seen by the optimizer is the version which will execute after 
return.  This means that it is not safe to perform an interprocedural 
optimization which moves a side effect past the return from @f without 
adjusting the deoptimization state to reflect that that movement has 
happened.  Today, we have no mechanism to do that adjustment, so 
instead, we must disallow the movement.

It's worth pointing out a couple of things here:
1) This is only problematic for a class of optimizations LLVM does not 
currently implemented.
2) The code motion involved would only be legal if the optimizer saw 
*all* callers of @f.  For a generic external function, this could never 
be true.

>
> There is an analogous case where we have to model the deopt operand
> bundle as reads-everything-on-entry: if we have cases where we
> deoptimize on entry.  IOW, something like this:
>
>    ; @global starts off as 0
>
>    define void @side_exit() readonly {
>      call void @deoptimize_my_caller()
>      return
>    }
>
>    define void @store_field(ref) {
>     (*@global)++;
>   lbl:
>     if (ref == nullptr) {
>       call void @side_exit() ;; vm_state = at label lbl
>       unreachable
>     } else {
>       ref->field = 42;
>     }
>    }
>
> could be transformed to
>
>    define void @side_exit() readonly {
>      (*@global)++;
>      call void @deoptimize_my_caller()
>      return
>    }
>
>    define void @store_field(ref) {
>   lbl:
>     if (ref == nullptr) {
>       call void @side_exit() ;; vm_state = at label lbl
>       unreachable
>     } else {
>       (*@global)++;
>       ref->field = 42;
>     }
>    }
>
> Now if `ref` is null and we do not inline `@side_exit` then we will
> end up incrementing `@global` twice.
(This is just the inverse example to the above.  Same issues apply.)
>
> In practice I think we can work around these issues by marking
> `@side_exit` and `@f` as external, so that inter-procedural code
> motion does not happen but
>
>   a. That would be a workaround, the semantic issues will still exist
>   b. LLVM is still free to specialize external functions.
Your last point is a good one.  I hadn't considered that in my response 
above.

My suggestion would be that we frame this in one of three ways:
1) We separate the function interposition requirements (the readonly on 
call and return mentioned above) as a separate function attribute.  We 
do not necessarily need to tie that to the bundle description.
2) We could define the bundle to imply an unknown caller of the declared 
callee.  This would get around the problem above by essentially 
preventing the callee from ever being non-external.
3) We could define the bundle to imply readonly/readwrite apply at 
beginning and end of the call.  This is a much more restricted 
interpretation of the memory effects and doesn't seem to require any 
changes to the optimizer today.  If we add an IPO pass like the one 
above, we might need some changes, but even there, they shouldn't be huge.

I think I'm leaning towards (1) with the function attribute having the 
semantics described in (3) but separated into it's own attribute.  Thoughts?

(By function attribute, I might mean "linkage".  This seems to be 
interestingly similar to available_externally and some of the odd 
properties wanted by the LTO folks.  Just thinking aloud.)

In all cases, we should require that a call with deopt bundle arguments 
be to a function with at least readonly.  As we discussed above, nothing 
else really makes sense.
>
> As a meta point, I think the right way to view operand bundles is as
> something that *happens* before and after an call / invoke, not as a
> set of values being passed around.  For that reason, do you think they
> should be renamed to be something else?
Not really.  In fact, I'm really concerned about the "happens" 
interpretation.  I think that is likely to snowball with additional 
semantics and be hard to reason about.  I'd much rather have the bundles 
be fairly restricted and introduce additional attributes for any weird 
requirements a runtime might have.

Philip