[PATCH] D20116: Add speculatable function attribute

Mon Mar 27 22:29:58 PDT 2017

dberlin added a comment.

In https://reviews.llvm.org/D20116#711865, @sanjoy wrote:

> In https://reviews.llvm.org/D20116#711778, @dberlin wrote:
>
> > In https://reviews.llvm.org/D20116#711696, @sanjoy wrote:
> >
> > > In https://reviews.llvm.org/D20116#710937, @dberlin wrote:
> > >
> > > > > I think that all of this is right, you can't apply some of these optimizations to call sites with the speculatable attribute.
> > > >
> > > > A lot of these (but not all of these) amount to "you cannot clone speculatable", ie if you clone the call, you must remove the attribute.
> > > >
> > > > I believe the set of conditions under which you could clone the attribute are:
> > > >
> > > > 1. The new call is CDEQ the original call (IE the set of conditions under which it executes is identical).  IF you are cloning from one function to another, it must be CDEQ using the interprocedural control dependence.
> > > > 2. The arguments are identical.
> > > > 3. The function called is identical or marked speculatable
> > > >
> > > >   Note this is not entirely shocking,.
> > >
> > >
> > > (Caveat: I think we're arguing some subtle points here, so apologies if I misunderstood your intent.)
> > >
> > > I think (1) //is// somewhat shocking.  I'd say normally we follow a weaker constraint:  "The new call is executed only if the old call was executed", not "The new call is executed if and only [edit: previously this incorrectly said "only if and only if"] if the old call was executed".
> > >
> > > For instance, if we unroll loops (say by a factor of 2):
> > >
> > >   for (i = 0; i < N; i++)
> > >     f(i) readonly;      // X
> > >
> > >
> > > it is reasonable that the result be:
> > >
> > >   for (i = 0; i < N / 2; i += 2) {
> > >     f(i) readonly;       // A
> > >     f(i + 1) readonly;   // B
> > >   }
> >
> >
> > I'm presuming you meant this loop  to go half as much but still cover the same values in the same order in calls to f?
> >  (it doesn't)
>
>
> Do'h.  I did a split-brain there. :)
>
> > If so, agree we allow it in practice, butt hat's in practice
> >  Here is an, IMHO,  valid callsite attribute readonly implementation of f, not legally doable in C, but we're talking about llvm IR anyway.
> > 
> >   void f(int a)
> >   {
> >     // go scrobbling through instruction stream to see what the increment is
> >     if (increment == 2)
> >       write some memory
> >     otherwise
> >       do nothing
> >   }
> > 
> > 
> > Now, obviously, i'm cheating, but the point is, f can read whatever state it wants here, and detect what you've done, and not be readonly in that case.
>
> I don't think we can reasonably allow functions to base their behavior on the instruction stream of their callers (or anywhere else, for that matter), in IR or in C.

As mentioned, i don't think that's strictly necessary to screw up what you did.  But i'm willing to admit those are likely semantic bugs.
:)
In any case, I was just half-pointing out that i don't think the semantics of the other attribute are so cut and dried in their control dependence, etc,  when applied to callsites, that speculatable on callsites is that bad.

> For instance, if we allowed things like that, even basic optimizations like this:
> 
>   void f() {
>     int a = 2, b = 3;
>     f(a + b);
>   }
> 
> 
> to
> 
>   void f() {
>     f(5);
>   }
> 
> 
> would not be valid, since an implementation of `f` could be:
> 
>   void f() {
>     cond = does the caller's instruction stream have an add instruction?
>     if (cond)
>       print("hi");
>   }
> 
> 
> and we'd change observable behavior by optimizing out the add instruction.

Right, my point was more on the side of "we are pretending we've got the semantics of these other instructions down really well and they are easy to understand", when, honestly, there is nothing in the langref that outlaws what i wrote :)

The closest it comes is "when called with the same set of arguments and global state"
But you aren't calling it with the same global state  depending on the definition of global state, which .. we don't define anywhere.

> The example you gave above seems odd to me because the callee is has different behavior based on the instruction stream of the caller, but the conditionally-readonly aspect of it is fine.

As mentioned I'm pretty positive i could construct an example just as bad given the  limitations expressed in the langref :)
I

>> Given they can also unwind, and read state of  other functions, i'm pretty positive you can come up with implementations easier than what i just did to detect and write memory in the two cases differently, without resorting to this trickery.
> 
> If there was a legal way to detect a difference between the two cases above, then loop unrolling would be illegal.

FWIW, i'm honestly too lazy to go construct one, but i'm basically positive you can.  At least, legal gievn the semantics and restrictions on LLVM IR as it exists today, becuase LLVM IR is not *that* well defined.
I would not believe it well-defined in C, but only because unwinders, etc are usually not well defined.
This is basically an exercise in the moral equivalent of godel numbering.
I'm going to go with "yes"
Is it possible to define our IR well enough to outlaw that: Maybe? 
I'm not going to tug at this thread any longer though :P

> So even if there is some way to detect the difference, I'd consider that a bug in the LLVM IR semantics since it disallows an important optimization.

I guess my basic point was exactly this - the semantics of our other attributes, compared to speculatable, are not so easy or non-shocking that i think artificially limiting speculatable to non-callsites only makes sense.  Particularly becuase those attributes may not be so well defined that their behavior makes complete and total sense when applied to callsites
Instead, when we discover they are wrong we fix them.  So why not just mark speculatable on callsites as experimental ,  see what happens, and go from there?

Bottom line: IMHO, We are unlikely to ever figure out the specific set of issues we will hit on callsites with speculatable if we don't allow it there.  Yeah, we'll figure out if it's completely broken if we don't, but we all seem to agree that it's probably not?

I'm going to drop the rest of this, fwiw, because i don't think it's worth pushing the readonly example any further.
I'm positive i can come up with a program that meets whatever requirements you throw at it and still breaks in your example, precisely because we don't define our IR well enough to prevent it (I'm on vacation, so i'm not going to think about whether i could prove that you can't make the IR well defined enough to prevent it and still have it express useful programs :P)

https://reviews.llvm.org/D20116