[llvm] r289755 - Make processing @llvm.assume more efficient by using operand bundles

Sun Jan 8 13:33:09 PST 2017

On 01/05/2017 09:22 AM, Daniel Berlin wrote:
>
>
> On Wed, Jan 4, 2017 at 3:11 PM, Hal Finkel <hfinkel at anl.gov 
> <mailto:hfinkel at anl.gov>> wrote:
>
>
>     On 01/03/2017 10:01 AM, Daniel Berlin wrote:
>>
>>
>>     On Tue, Jan 3, 2017 at 5:43 AM, Hal Finkel via llvm-commits
>>     <llvm-commits at lists.llvm.org
>>     <mailto:llvm-commits at lists.llvm.org>> wrote:
>>
>>         Hi Philip,
>>
>>         First, I'm planning on rewriting this change to be more
>>         targeted, specifically to take the alternative approach we
>>         discussed briefly on the review thread and keep the list of
>>         affected variables in the assumption cache.
>>
>>
>>         Second, regarding motivation, the problem is not the
>>         basic-block scan (with the assumption cache, we actually do
>>         very little of that anyway). The core problem is that, in
>>         ValueTracking, which is used heavily by InstCombine and other
>>         passes, checks all assumptions in each function for every
>>         variable being processed to determine known bits, etc. These
>>         checks are pattern matches, and while they normally fail
>>         quickly, as the number of assumptions increases, this is a
>>         large problem: it makes known-bits determination
>>         O(#assumptions in the function), for every variable, in the
>>         common case, and, because the assumption processing is part
>>         of the recursive known-bits procedure, the constant factor
>>         can be large too given certain kinds of inputs. What we need
>>         for ValueTracking (and also for LVI and maybe others) is a
>>         way to cache which variables are (usefully) affected by which
>>         assumptions.
>>
>>
>>     When you say affected, do you mean "what uses"?
>>     Because your operand bundles change, does not really do that.
>>     For a given use I still have to find the assume (which is in the
>>     middle of a block), then go to the bundle, then look at the uses
>>     of the things in that bundle
>>     Or am i missing something?
>
>     You're missing something. The "affected uses" are added by
>     InstCombine by looking for patterns useful to ValueTracking, etc.
>     So, for example, if we have:
>
>         %cmp1 = icmp eq i8 %a, 5
>         call void @llvm.assume(i1 %cmp1)
>
>     Then InstCombine would come along and add the operand bundle like
>     this:
>
>         call void @llvm.assume(i1 %cmp1) [ "affected"(i8 %a) ]
>
>
> Right, but this looks upwards, no?

Correct.

>
>     So, if we're looking at %a, and want to know if there's some
>     assumption that might be relevant to it, we just need to look at
>     %a's users (instead of looking through all assumptions).
>
>
> Okay, so i still have to go and get the uses of a.
> Still better than it was, admittedly.
>
>
>     It's certainly true that, as discussed below, once we find an
>     assume in the users of %a, we need to verify the dominance
>     relationship between the assume and %a, but this is still rare and
>     has not yet presented itself as a problem as far as I can tell.
>
>
> Okay.
> It can be ... tricky, because for conditionals it's edge dominance, 
> not block dominance.  For assumes, you don't have this issue.
>
>
>
>
>
>     The first problem is that checking whether some variable, %x, is
>     potentially affected by an assume cannot be O(#assumes in the
>     function).
>
>
> Right, i guess my definition of potentially affected was slightly 
> different initially.
>
> IE given
>
> c = icmp eq a, b
>
> llvm.assume(c)
> use a
> use b
> use c
>
> I would have expected the use a, use b, and use c to be what you meant 
> by  "potentially affected", and changing it to:
>
> c = icmp eq a, b
>
> llvm.assume(c) ("%c")
> use a
> use b
> use c
>
> doesn't help find those.
>
> (whereas, in the predicateinfo scheme, you can add info for all 3 if 
> you like)

Yes, it is. a, b, and c are all "affected", so we should have:

   llvm.assume(c) ("%a, %b")

so the assume was a user of a, b, and c. It is equivalent to the 
predicateinfo scheme in this respect.

>
>     That's what ValueTracking, LVI, etc. do now and, for obvious
>     reasons, causes problems as we make more-extensive use of assumes
>     (or don't because of this performance problem).
>
>>
>>         Regarding your suggestions below, I can definitely see how we
>>         might run into a problem with the dominance checking if we
>>         had a large number of assumptions which affected a variable
>>         -- it is only when we have an assumption that potentially
>>         affects a variable that we do any of the dominance checking
>>         -- but I've not run into any such cases yet.
>>
>>
>>     Note: I have precisely the same problem with newgvn's predicate
>>     handling:
>>
>>     I need to be able to tell, for a given instruction, what
>>     predicates dominate it (including assumes)
>>
>>     It's pretty fast for most predicates, because they occur at the
>>     end of blocks, but could still be sped up.
>>     For assumes, not so much, because they can occur wherever.
>>
>>     I have a prototype where i've implemented e-ssa to solve both
>>     problems. It requires the introduction of one intrinsic, and some
>>     use renaming.
>
>     Where do we introduce the intrinsic?
>
>
> Wherever we want. I'd say before the first CSE/CCP pass we currently 
> have that uses assumes or conditional info.
>
>     Do we maintain it throughout the pipeline?
>
>
> Yes, but this is easy.
>
> First, as a preliminary, screwups just cause missed optimizations, not 
> correctness issues.  It's not like memoryssa or even regular ssa, 
> where updates are tricky, because the values you are adding are never 
> distinct.
>
> There are three update cases:
>
> 1. You make a conditional branch unconditional - you can
>
> A. do nothing - in 99.99% of cases we do this, one of the blocks is 
> dead and we remove it.  Either the icmp will die, in which case, you 
> can either replace the other operand of predicateinfo with undef (IE 
> lazy cleanup), or we could kill it explicitly.
>
> Example:
>
> c =icmp eq a, b
> br c, label %1, label %2
>
> 1:
> c1 = predicateinfo(c)
> 2:
> c2 = predicateinfo(c)
> ->
> br label %2
> 2:
> c2 = predicateinfo(undef if c is dead, or just kill this thing)
>
>
> 2. You make an unconditional branch conditional -
> we pretty much don't do this, but if we do, you can add 
> predicateinfos, in which case, you win
> or you can forget to, and we just miss an optimization.
>
> The update algorithm is pretty simple:
>
> If the two target blocks exist already, you add predicateinfo at the 
> top, and rename affected uses.  The total set of affected uses is just 
> the set of uses of the original icmp, you can just go through them and 
> see which ones belong to which branch using dominance (assuming you 
> updated the dominator tree too).  This has low cost.
>
> If the blocks don't exist, you can just rename them while building at 
> basically no cost.
>
> Note that adding them does not strictly require phi nodes, because 
> they do not produce distinct values The phi will always simplify to 
> the original variable.
>
> It's like
> a = b
> if (<something>)
>   a = b
>
> While we may transform this initially to
> a1 = b
> if (<something>)
>  a2 = b
> phi(a1, a2)
>
>  the phi is not *actually* necessary, and will be simplified to phi(b, 
> b) and then removed.
>
> Now, there are passes that may want the phi node there to see the 
> congruence (this is the same issue we have with "pruned" vs "not 
> pruned" ssa already), but we could also just *always* insert a phi 
> node, too, if that becomes an issue.
> This should require no extra computation, because in basically all 
> cases you would have already had to compute the proper place for phi 
> nodes for correctness of the*rest* of the IR you have changed, and the 
> predicateinfo phis would not go in a different set of blocks.
>
> 3. You add a completely new branch - same as #2, because unconditional 
> branches would not have had predicate info before.
>
>
> While i've given the above answers assuming you only change
>
> c = icmp eq a, b
> br c, %1, %2
> 1:
> use c
> 2:
> use c
> into
>
> c =icmp eq a, b
> br c, label %1, label %2
>
> 1:
> c1 = predicateinfo(c)
> use c1
> 2:
> c2 = predicateinfo(c)
> use c2
>
> you can also do
>
> c =icmp eq a, b
> br c, label %1, label %2
>
> 1:
> a1 = predicateinfo(a, c)
> b1 = predicateinfo(b, c)
> c1 = predicateinfo(c, c)
>
> 2:
> a2 = predicateinfo(a, c)
> b2 = predicateinfo(b, c)
> c2 = predicateinfo(c, c)
>
>
> You could also obviously prune this so you only do it if there are 
> uses of the variable in the blocks dominated by either conditional 
> edge, and then your only real explosion in variables tends to be in 
> switches.
>
>>     if (foo == 50) {
>>     } else { }
>>
>>     becomes
>>
>>     if (foo == 50) {
>>     foo1 = predicateinfo(foo, the icmp of foo )
>>     (use foo1)
>>     } else {
>>     foo2 = predicateinfo(foo, the icmp of foo)
>>     (use foo2)
>>     }
>>
>>     Once that's done, finding the set of predicates that usefully
>>     affect a given use is trivial, as you can just walk the def-use
>>     chains.
>>
>>     I haven't implemented it for assume, but it should work just as
>>     well for that.
>>
>>     I'm somewhat against caching solutions and hacky here,I generally
>>     prefer to solve infrastructury problems with better
>>     infrastructure.  Caching tends to just kick the can down the
>>     road, and often makes it harder to build better infrastructure. 
>>     It makes sense when the change is significantly harder to get
>>     right, and not worth the cost. Here, honestly, we can do better.
>>     We have plenty of cases where CVP, etc, will be significantly
>>     sped up by the above (i can hand them to you) but not by your change.
>>
>>     Note also that it improves a large set of passes.
>>     For example, the variable split enables SCCP to do correlated
>>     value propagation (trivially for equality comparison, other
>>     things with a bit of work)
>>
>>     Before, it could only do something if "foo" turned out the same
>>     everywhere.
>>     After, you can propagate  "foo == 50" to the foo1 branch.
>>
>>     Early measurements come out to a few percent more constants found
>>     total.
>
>     How will we know when to add the predicateinfo intrinsic, or will
>     we do it for all values used in the conditionally-executed region?
>
>
> All values used.
>
> There is a pruned form, or you could use heuristics, but i doubt it's 
> worth it.

Makes sense.

Thanks again,
Hal

>     This patch did the equivalent thing by pattern-matching the condition.
>
>     Generally, I'm in favor of going in this direction.
>
>     For assumes, we'd still need to pattern match, right? If so, this
>     is still a cache in a sense ;) -- I suspect we'd still need the
>     assumption cache too in order to efficiently compute ephemeral
>     values (so that the inline cost analysis, etc. still have the
>     right complexity bounds).
>
>
> Maybe.
>
>     Thanks again,
>     Hal
>
>>
>>
>
>     -- 
>     Hal Finkel
>     Lead, Compiler Technology and Programming Languages
>     Leadership Computing Facility
>     Argonne National Laboratory
>
>

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170108/51b46072/attachment.html>