[llvm-dev] Giving up using implicit control flow in guards

Fri Oct 26 01:33:28 PDT 2018

I missed this when it went around, but I wanted to follow up to say that I
really like the semantic direction here.

I have some nit-picky concerns that I'm following up on in the code review,
but wanted to leave a record here that this seems like a really nice way to
progress modelling these things.

-Chandler

On Tue, Jul 17, 2018 at 8:10 PM Philip Reames via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Replying specifically to Sanjoy's questions inline.  For the record,
> I've talked with Max about his proposal in depth offline and generally
> support the direction.  As Sanjoy pointed out, there are some details to
> be worked out, but Max has gotten far enough with his prototyping at
> this point that I'm personally convinced they're all handleable w/effort.
>
>
> On 07/13/2018 11:21 AM, Sanjoy Das via llvm-dev wrote:
> > If we only need the guard representation only for widening, why not
> > first run some cleanup passes, widen the guards and then lower the
> > guards and then run the rest of the pipeline which will see the
> > explicit branching?
> Key thing is that we want to be able to widen conditions across function
> boundaries.  Here's a trivial example:
> int load(int[] a, int offset) { return a[offset]; }
> int b(int[] a) { load(a, 0) + load(a,1); }
>
> This example should require (at most) one runtime range check.
>
> As such, we need the widenable representation through inlining which is
> a very sizable chunk of the pipeline.
> > Also, in the past we had talked about piggybacking on
> > AssumptionCacheTracker to solve some of the problems (with the
> > implicit nature of guards) you highlight.  Can that be made to work?
> The conclusion of the "let's just use invokes" thread I started a while
> ago was essentially the opposite.  That is, we should be using invokes
> for assumes as well specifically because it eliminates the need for
> anything like an assumption cache.
>
> Or to say it differently, yes, we can improve (a bunch) the current
> implementation by commoning some code between guards and assumes, and
> using explicit vs implicit control flow (i.e. terminators), but we still
> have the general problem of needing to update every single pass.  You
> could argue that if we did a better job of sharing/structuring the code,
> this would at least mean we picked up support for assumes pretty much
> everywhere too, but I'm personally not convinced that's a large enough
> value to justify the investment.
>
> It's also a bit odd to have an invoke without a meaningful unwind target
> which is what we end up with for both assumes and invokes. You
> essentially end up having to special case unreachable to leave the
> invoke in place.  Doable, but a bit odd.
>
> I was originally of the opinion that improving/commoning/merging the
> current representation was the easiest path forward, but Max's
> prototyping work has impressed me with how few changes he's needed to
> make to get full coverage of the proposed representation throughout the
> optimizer.  I'm (weakly) convinced his proposal is the right path
> forward, but I think either approach could be made to work without an
> unsurmountable amount of effort.
> >
> > On Fri, Jul 13, 2018 at 1:19 AM Maxim Kazantsev <max.kazantsev at azul.com>
> wrote:
> >> Hi Sanjoy,
> >>
> >> Thanks for feedback! As for memory effects, currently I use "
> inaccessiblememonly " for it. It allows to prove that it doesn't alias with
> any other load/store in the function, but doesn't allow CSE to eliminate
> it. It is not actually super-cool, because there is no way that we can
> safely hoist it out of loop (and sometimes we want to, for example to make
> unswitching). So this part is a one I am still thinking about.
> >>
> >> As an alternative to widenable_condition() call, we are considering
> loads from special dedicated global fields. The bad thing about them is
> that if a loop gets peeled, then a widenable condition in the loop can be
> optimized away basing on fact that this field was true on the 1st iteration.
> > You'd ideally want every *execution* of a guard to possibly have a
> > different widening, but using global variables hides that.
> >
> > -- Sanjoy
> >
> >> Thanks,
> >> Max
> >>
> >> -----Original Message-----
> >> From: Sanjoy Das [mailto:sanjoy at playingwithpointers.com]
> >> Sent: Wednesday, July 11, 2018 8:35 PM
> >> To: Maxim Kazantsev <max.kazantsev at azul.com>
> >> Cc: llvm-dev <llvm-dev at lists.llvm.org>
> >> Subject: Re: [llvm-dev] Giving up using implicit control flow in guards
> >>
> >> Hi Max,
> >>
> >> I think this works, but I'm a bit wary around the complexity introduced
> by widenable_condition.  For instance, we'd have to be careful in spec'ing
> out its memory effects (if it is readnone/readonly, it will get CSE'ed
> (which may not be what you
> >> want?) but if it can write memory it will inhibit optimizations).  I'm
> also worried that with this representation it will become difficult to
> discover which conditions are widenable (e.g. after optimization we may end
> up using a phi of a select of widenable_condition() in a branch condition,
> obscuring the fact that the branch was widenable).
> >>
> >> But I've not looked at this stuff for a while so I'm happy to trust
> your judgement. :)
> >>
> >> -- Sanjoy
> >> On Tue, Jul 10, 2018 at 12:26 AM Maxim Kazantsev via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >>> Hello Everyone,
> >>>
> >>>
> >>>
> >>> I want to raise a discussion about @llvm.experimental.guard intrinsic
> and reasons why we should give up using it. Here is an alternative approach
> to representation of guards that resolves some of fundamental flaws that
> the current guards have.
> >>>
> >>>
> >>>
> >>> Basically, this intrinsic was introduced to model the following
> situation: we want to check that some condition is true, and if it's not,
> we should deoptimize at this point and quit execution of the compiled code.
> According to
> http://llvm.org/docs/LangRef.html#llvm-experimental-guard-intrinsic, the
> guard looks like this:
> >>>
> >>>
> >>>
> >>> define void @llvm.experimental.guard(i1 %pred, <args...>) {
> >>>
> >>>    %realPred = and i1 %pred, undef
> >>>
> >>>    br i1 %realPred, label %continue, label %leave [, !make.implicit
> >>> !{}]
> >>>
> >>>
> >>>
> >>> leave:
> >>>
> >>>    call void @llvm.experimental.deoptimize(<args...>) [ "deopt"() ]
> >>>
> >>>    ret void
> >>>
> >>>
> >>>
> >>> continue:
> >>>
> >>>    ret void
> >>>
> >>> }
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> Since guard's leave branch has a semantics of deoptimize, it is
> *always* legal to go to this branch, even if the condition is true. It
> means that guard conditions can be widened with any other conditions
> without loss of correctness. In other words, if the following code is
> correct:
> >>>
> >>>
> >>>
> >>>      call void (i1, ...) @llvm.experimental.guard(i1 %cond) [ "deopt"()
> >>> ]
> >>>
> >>>      <some operations>
> >>>
> >>>
> >>>
> >>> The following code will also be correct, no matter what %other_cond is:
> >>>
> >>>
> >>>
> >>>      %wider_cond = and i1 %cond, %other_cond
> >>>
> >>>      call void (i1, ...) @llvm.experimental.guard(i1 %wider_cond) [
> >>> "deopt"() ]
> >>>
> >>>      <do some useful stuff>
> >>>
> >>>
> >>>
> >>> More formally, if %cond_a implies %cond_b, it is always valid to
> replace guard(%cond_b) with guard(%cond_a). We use this fact to make some
> optimizations, for example Loop Predication and Guard Widening passes.
> >>>
> >>>
> >>>
> >>> Given this semantics, all optimizations may assume that at every point
> dominated by a guard its condition is true, as well as they do when they
> derive this fact from explicit branch instructions. And it would work fine
> if it had a full support.
> >>>
> >>>
> >>>
> >>> Unfortunately, hiding this implicit control flow semantics within the
> intrinsic has some significant downsides. And the biggest one is that all
> transform and analysis passes that can derive conditions from explicit
> branches should also be taught about the semantics of guards, otherwise
> they will be interpreted as regular calls. We've spent a lot of time
> teaching that passes about the guards, but at some point it appeared that:
> >>>
> >>> 1.       There are still places in optimizer which are not aware about
> the guards and can only use conditions of explicit branches. Such are:
> exact look taken count calculation in SCEV, new loop unswitching, some
> transforms in EarlyCSE and potentially some others;
> >>>
> >>> 2.       Support of guards is mostly code duplication that makes it
> harder to read. In some cases (as in SCEV taken count calculation), the
> code needs massive refactoring to work with guard conditions, while branch
> conditions fit in easily.
> >>>
> >>> 3.       Finding all guards in a block is expensive. While every block
> has at most 1 branch condition, it can have an arbitrary amount of guards.
> There is no easy way to get the nearest dominating guard. We need to
> iterate through instructions to collect them all, which may turn some O(1)
> algorithms to O(N). It costs us compile time.
> >>>
> >>> 4.       The biggest problem is that whenever a new transform appears,
> we should make sure that it supports guards if it needs to. The maintenance
> cost of it is unreasonably high.
> >>>
> >>> 5.       In the past, we had numerous bugs because of incomplete
> support of guards. Current situation with them is quite healthy, however we
> don't know what happens when we add guard support to every new pass which
> could be not ready for that. More complex logic is potentially more broken.
> >>>
> >>>
> >>>
> >>> And we have all this overhead only because we want to be able to widen
> guards. There is no other differences between guards and regular branches
> to deopt blocks. So cheap and easy guard widening costs us hard and ugly
> support over the rest of LLVM code. Instead of this, we want to have
> something that will allow us to use full power of LLVM optimizer to employ
> guard conditions for free, even if it makes guard widening more complicated.
> >>>
> >>>
> >>>
> >>> Previously, there was a discussion about using invokes of guard
> intrinsic instead of calls to solve some of these problems. However this
> approach doesn't solve the biggest problem: we still need to and teach
> every single pass that may need it about the semantics of this intrinsic,
> and keep an eye on all new passes to make sure they can use it.
> >>>
> >>>
> >>>
> >>> I have a proposal how it can be done relatively easily and with
> minimum overhead. Let us inline the guard according to the specification:
> >>>
> >>>
> >>>
> >>>      %guard_cond = and i1 %cond, %undef
> >>>
> >>>      br i1 %guard_cond, label %guarded, label %deopt
> >>>
> >>>
> >>>
> >>>    guarded:
> >>>
> >>>      <do some useful stuff>
> >>>
> >>>
> >>>
> >>>    deopt:
> >>>
> >>>      <deoptimize and leave the compiled code>
> >>>
> >>>
> >>>
> >>> Let's notice two important facts:
> >>>
> >>> 1.       In guarded block, %cond is true because it is implied by
> %guard_cond being true.
> >>>
> >>> 2.       In deopt block, we cannot assume that %cond is false because
> it is not implied by %guard_cond being false. Thus, we cannot make
> transforms that would need to be discarded after we widen the guard.
> >>>
> >>>
> >>>
> >>> So from optimization perspective, all optimizations that were valid
> for branch by %cond are also valid for branch by %guard_cond in the guarded
> block. Luckily, most passes I've checked can deduce %a = true from %a & %b
> = true (and if a pass cannot do it, it's a good general improvement to
> teach it to). And this invariant remains, no matter how many other
> conditions we attach to the %guard_cond through and.
> >>>
> >>>
> >>>
> >>> Guard widening for such representation is done trivially: whenever we
> want to widen %guard_cond, we just attach the new condition to it through
> and, without loss of optimization opportunity and correctness.
> >>>
> >>>
> >>>
> >>> The only thing we should be able to do is to distinguish guards from
> regular branches (that don't have this semantics that "it is always legal
> to go to non-taken branch"). We could say that any branch by condition and
> i1 %cond, undef and deopt in non-taken branch can be seen as a guard. And
> it would be a good solution if it wasn't for a nasty fact that any other
> transform (like instcombine) can kill this undef and replace "and i1 %cond,
> undef " with "%cond" (or with "false", which is even worse). So basically,
> the only thing we need to distinguish the guard is a special type of undef
> value which will not be optimized away by other transforms (at least not
> before we are done with widening-based transforms).
> >>>
> >>>
> >>>
> >>> To solve that, we can introduce a special intrinsic that is called
> widenable_condition(). And we can just use it instead of the undef here
> like this:
> >>>
> >>>
> >>>
> >>>      %wc = call i1 widenable_condition()
> >>>
> >>>      %guard_cond = and i1 %cond, %wc
> >>>
> >>>      br i1 %guard_cond, label %guarded, label %deopt
> >>>
> >>>
> >>>
> >>>    guarded:
> >>>
> >>>      <do some useful stuff>
> >>>
> >>>
> >>>
> >>>    deopt:
> >>>
> >>>      <deoptimize and leave the compiled code>
> >>>
> >>>
> >>>
> >>> Now we can distinguish a guard from a regular conditional branch: a
> guard is a conditional branch whose condition is an and tree with a
> widenable_condition() as one of its leaves, and its non-taken branch goes
> straight to deoptimize intrinsic call without any side-effecting
> instructions before it. We no longer need the @llvm.experimental.guard
> instinsic and can use its modified inlined version instead.
> >>>
> >>>
> >>>
> >>> We also consider usage of a read from a global field instead of
> widenable_condition() call to emulate this special condition, but the
> fundamental idea is the same.
> >>>
> >>>
> >>>
> >>> Formally, when we no longer need it, we should lower this
> widenable_condition() into undef, but actually it is better to lower it
> into true value because we want to stay in compiled code and not go to
> deoptimize without reasons.
> >>>
> >>>
> >>>
> >>> Currently I have a working prototype of this new guards form, as well
> as Loop Predication and Guard Widening implementations for them. This
> implementation is still in process of evaluation, but so far it seems that
> this new form is at least as powerful as the old one, besides, it doesn't
> require any special actions to maintain it in all passes and analyzes and
> opens up new optimization opportunities (which are currently missed because
> of incomplete support of guard intrinsic). The patches are not in
> reviewable condition yet, but I am working to make them ready for review.
> >>>
> >>>
> >>>
> >>> What do you guys think about it? If someone has proposals or ideas on
> this problem and the suggested solution, please don't hesitate to share!
> >>>
> >>>
> >>>
> >>> Best regards,
> >>>
> >>> Max
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> LLVM Developers mailing list
> >>> llvm-dev at lists.llvm.org
> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181026/d1f0f0db/attachment-0001.html>