[llvm-dev] Giving up using implicit control flow in guards

Tue Jul 17 20:10:43 PDT 2018

Replying specifically to Sanjoy's questions inline.  For the record, 
I've talked with Max about his proposal in depth offline and generally 
support the direction.  As Sanjoy pointed out, there are some details to 
be worked out, but Max has gotten far enough with his prototyping at 
this point that I'm personally convinced they're all handleable w/effort.

On 07/13/2018 11:21 AM, Sanjoy Das via llvm-dev wrote:
> If we only need the guard representation only for widening, why not
> first run some cleanup passes, widen the guards and then lower the
> guards and then run the rest of the pipeline which will see the
> explicit branching?
Key thing is that we want to be able to widen conditions across function 
boundaries.  Here's a trivial example:
int load(int[] a, int offset) { return a[offset]; }
int b(int[] a) { load(a, 0) + load(a,1); }

This example should require (at most) one runtime range check.

As such, we need the widenable representation through inlining which is 
a very sizable chunk of the pipeline.
> Also, in the past we had talked about piggybacking on
> AssumptionCacheTracker to solve some of the problems (with the
> implicit nature of guards) you highlight.  Can that be made to work?
The conclusion of the "let's just use invokes" thread I started a while 
ago was essentially the opposite.  That is, we should be using invokes 
for assumes as well specifically because it eliminates the need for 
anything like an assumption cache.

Or to say it differently, yes, we can improve (a bunch) the current 
implementation by commoning some code between guards and assumes, and 
using explicit vs implicit control flow (i.e. terminators), but we still 
have the general problem of needing to update every single pass.  You 
could argue that if we did a better job of sharing/structuring the code, 
this would at least mean we picked up support for assumes pretty much 
everywhere too, but I'm personally not convinced that's a large enough 
value to justify the investment.

It's also a bit odd to have an invoke without a meaningful unwind target 
which is what we end up with for both assumes and invokes. You 
essentially end up having to special case unreachable to leave the 
invoke in place.  Doable, but a bit odd.

I was originally of the opinion that improving/commoning/merging the 
current representation was the easiest path forward, but Max's 
prototyping work has impressed me with how few changes he's needed to 
make to get full coverage of the proposed representation throughout the 
optimizer.  I'm (weakly) convinced his proposal is the right path 
forward, but I think either approach could be made to work without an 
unsurmountable amount of effort.
>
> On Fri, Jul 13, 2018 at 1:19 AM Maxim Kazantsev <max.kazantsev at azul.com> wrote:
>> Hi Sanjoy,
>>
>> Thanks for feedback! As for memory effects, currently I use " inaccessiblememonly " for it. It allows to prove that it doesn't alias with any other load/store in the function, but doesn't allow CSE to eliminate it. It is not actually super-cool, because there is no way that we can safely hoist it out of loop (and sometimes we want to, for example to make unswitching). So this part is a one I am still thinking about.
>>
>> As an alternative to widenable_condition() call, we are considering loads from special dedicated global fields. The bad thing about them is that if a loop gets peeled, then a widenable condition in the loop can be optimized away basing on fact that this field was true on the 1st iteration.
> You'd ideally want every *execution* of a guard to possibly have a
> different widening, but using global variables hides that.
>
> -- Sanjoy
>
>> Thanks,
>> Max
>>
>> -----Original Message-----
>> From: Sanjoy Das [mailto:sanjoy at playingwithpointers.com]
>> Sent: Wednesday, July 11, 2018 8:35 PM
>> To: Maxim Kazantsev <max.kazantsev at azul.com>
>> Cc: llvm-dev <llvm-dev at lists.llvm.org>
>> Subject: Re: [llvm-dev] Giving up using implicit control flow in guards
>>
>> Hi Max,
>>
>> I think this works, but I'm a bit wary around the complexity introduced by widenable_condition.  For instance, we'd have to be careful in spec'ing out its memory effects (if it is readnone/readonly, it will get CSE'ed (which may not be what you
>> want?) but if it can write memory it will inhibit optimizations).  I'm also worried that with this representation it will become difficult to discover which conditions are widenable (e.g. after optimization we may end up using a phi of a select of widenable_condition() in a branch condition, obscuring the fact that the branch was widenable).
>>
>> But I've not looked at this stuff for a while so I'm happy to trust your judgement. :)
>>
>> -- Sanjoy
>> On Tue, Jul 10, 2018 at 12:26 AM Maxim Kazantsev via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>>> Hello Everyone,
>>>
>>>
>>>
>>> I want to raise a discussion about @llvm.experimental.guard intrinsic and reasons why we should give up using it. Here is an alternative approach to representation of guards that resolves some of fundamental flaws that the current guards have.
>>>
>>>
>>>
>>> Basically, this intrinsic was introduced to model the following situation: we want to check that some condition is true, and if it's not, we should deoptimize at this point and quit execution of the compiled code. According to http://llvm.org/docs/LangRef.html#llvm-experimental-guard-intrinsic, the guard looks like this:
>>>
>>>
>>>
>>> define void @llvm.experimental.guard(i1 %pred, <args...>) {
>>>
>>>    %realPred = and i1 %pred, undef
>>>
>>>    br i1 %realPred, label %continue, label %leave [, !make.implicit
>>> !{}]
>>>
>>>
>>>
>>> leave:
>>>
>>>    call void @llvm.experimental.deoptimize(<args...>) [ "deopt"() ]
>>>
>>>    ret void
>>>
>>>
>>>
>>> continue:
>>>
>>>    ret void
>>>
>>> }
>>>
>>>
>>>
>>>
>>>
>>> Since guard's leave branch has a semantics of deoptimize, it is *always* legal to go to this branch, even if the condition is true. It means that guard conditions can be widened with any other conditions without loss of correctness. In other words, if the following code is correct:
>>>
>>>
>>>
>>>      call void (i1, ...) @llvm.experimental.guard(i1 %cond) [ "deopt"()
>>> ]
>>>
>>>      <some operations>
>>>
>>>
>>>
>>> The following code will also be correct, no matter what %other_cond is:
>>>
>>>
>>>
>>>      %wider_cond = and i1 %cond, %other_cond
>>>
>>>      call void (i1, ...) @llvm.experimental.guard(i1 %wider_cond) [
>>> "deopt"() ]
>>>
>>>      <do some useful stuff>
>>>
>>>
>>>
>>> More formally, if %cond_a implies %cond_b, it is always valid to replace guard(%cond_b) with guard(%cond_a). We use this fact to make some optimizations, for example Loop Predication and Guard Widening passes.
>>>
>>>
>>>
>>> Given this semantics, all optimizations may assume that at every point dominated by a guard its condition is true, as well as they do when they derive this fact from explicit branch instructions. And it would work fine if it had a full support.
>>>
>>>
>>>
>>> Unfortunately, hiding this implicit control flow semantics within the intrinsic has some significant downsides. And the biggest one is that all transform and analysis passes that can derive conditions from explicit branches should also be taught about the semantics of guards, otherwise they will be interpreted as regular calls. We've spent a lot of time teaching that passes about the guards, but at some point it appeared that:
>>>
>>> 1.       There are still places in optimizer which are not aware about the guards and can only use conditions of explicit branches. Such are: exact look taken count calculation in SCEV, new loop unswitching, some transforms in EarlyCSE and potentially some others;
>>>
>>> 2.       Support of guards is mostly code duplication that makes it harder to read. In some cases (as in SCEV taken count calculation), the code needs massive refactoring to work with guard conditions, while branch conditions fit in easily.
>>>
>>> 3.       Finding all guards in a block is expensive. While every block has at most 1 branch condition, it can have an arbitrary amount of guards. There is no easy way to get the nearest dominating guard. We need to iterate through instructions to collect them all, which may turn some O(1) algorithms to O(N). It costs us compile time.
>>>
>>> 4.       The biggest problem is that whenever a new transform appears, we should make sure that it supports guards if it needs to. The maintenance cost of it is unreasonably high.
>>>
>>> 5.       In the past, we had numerous bugs because of incomplete support of guards. Current situation with them is quite healthy, however we don't know what happens when we add guard support to every new pass which could be not ready for that. More complex logic is potentially more broken.
>>>
>>>
>>>
>>> And we have all this overhead only because we want to be able to widen guards. There is no other differences between guards and regular branches to deopt blocks. So cheap and easy guard widening costs us hard and ugly support over the rest of LLVM code. Instead of this, we want to have something that will allow us to use full power of LLVM optimizer to employ guard conditions for free, even if it makes guard widening more complicated.
>>>
>>>
>>>
>>> Previously, there was a discussion about using invokes of guard intrinsic instead of calls to solve some of these problems. However this approach doesn't solve the biggest problem: we still need to and teach every single pass that may need it about the semantics of this intrinsic, and keep an eye on all new passes to make sure they can use it.
>>>
>>>
>>>
>>> I have a proposal how it can be done relatively easily and with minimum overhead. Let us inline the guard according to the specification:
>>>
>>>
>>>
>>>      %guard_cond = and i1 %cond, %undef
>>>
>>>      br i1 %guard_cond, label %guarded, label %deopt
>>>
>>>
>>>
>>>    guarded:
>>>
>>>      <do some useful stuff>
>>>
>>>
>>>
>>>    deopt:
>>>
>>>      <deoptimize and leave the compiled code>
>>>
>>>
>>>
>>> Let's notice two important facts:
>>>
>>> 1.       In guarded block, %cond is true because it is implied by %guard_cond being true.
>>>
>>> 2.       In deopt block, we cannot assume that %cond is false because it is not implied by %guard_cond being false. Thus, we cannot make transforms that would need to be discarded after we widen the guard.
>>>
>>>
>>>
>>> So from optimization perspective, all optimizations that were valid for branch by %cond are also valid for branch by %guard_cond in the guarded block. Luckily, most passes I've checked can deduce %a = true from %a & %b = true (and if a pass cannot do it, it's a good general improvement to teach it to). And this invariant remains, no matter how many other conditions we attach to the %guard_cond through and.
>>>
>>>
>>>
>>> Guard widening for such representation is done trivially: whenever we want to widen %guard_cond, we just attach the new condition to it through and, without loss of optimization opportunity and correctness.
>>>
>>>
>>>
>>> The only thing we should be able to do is to distinguish guards from regular branches (that don't have this semantics that "it is always legal to go to non-taken branch"). We could say that any branch by condition and i1 %cond, undef and deopt in non-taken branch can be seen as a guard. And it would be a good solution if it wasn't for a nasty fact that any other transform (like instcombine) can kill this undef and replace "and i1 %cond, undef " with "%cond" (or with "false", which is even worse). So basically, the only thing we need to distinguish the guard is a special type of undef value which will not be optimized away by other transforms (at least not before we are done with widening-based transforms).
>>>
>>>
>>>
>>> To solve that, we can introduce a special intrinsic that is called widenable_condition(). And we can just use it instead of the undef here like this:
>>>
>>>
>>>
>>>      %wc = call i1 widenable_condition()
>>>
>>>      %guard_cond = and i1 %cond, %wc
>>>
>>>      br i1 %guard_cond, label %guarded, label %deopt
>>>
>>>
>>>
>>>    guarded:
>>>
>>>      <do some useful stuff>
>>>
>>>
>>>
>>>    deopt:
>>>
>>>      <deoptimize and leave the compiled code>
>>>
>>>
>>>
>>> Now we can distinguish a guard from a regular conditional branch: a guard is a conditional branch whose condition is an and tree with a widenable_condition() as one of its leaves, and its non-taken branch goes straight to deoptimize intrinsic call without any side-effecting instructions before it. We no longer need the @llvm.experimental.guard instinsic and can use its modified inlined version instead.
>>>
>>>
>>>
>>> We also consider usage of a read from a global field instead of widenable_condition() call to emulate this special condition, but the fundamental idea is the same.
>>>
>>>
>>>
>>> Formally, when we no longer need it, we should lower this widenable_condition() into undef, but actually it is better to lower it into true value because we want to stay in compiled code and not go to deoptimize without reasons.
>>>
>>>
>>>
>>> Currently I have a working prototype of this new guards form, as well as Loop Predication and Guard Widening implementations for them. This implementation is still in process of evaluation, but so far it seems that this new form is at least as powerful as the old one, besides, it doesn't require any special actions to maintain it in all passes and analyzes and opens up new optimization opportunities (which are currently missed because of incomplete support of guard intrinsic). The patches are not in reviewable condition yet, but I am working to make them ready for review.
>>>
>>>
>>>
>>> What do you guys think about it? If someone has proposals or ideas on this problem and the suggested solution, please don't hesitate to share!
>>>
>>>
>>>
>>> Best regards,
>>>
>>> Max
>>>
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev