[llvm] r289755 - Make processing @llvm.assume more efficient by using operand bundles
Hal Finkel via llvm-commits
llvm-commits at lists.llvm.org
Sun Jan 8 13:33:09 PST 2017
On 01/05/2017 09:22 AM, Daniel Berlin wrote:
>
>
> On Wed, Jan 4, 2017 at 3:11 PM, Hal Finkel <hfinkel at anl.gov
> <mailto:hfinkel at anl.gov>> wrote:
>
>
> On 01/03/2017 10:01 AM, Daniel Berlin wrote:
>>
>>
>> On Tue, Jan 3, 2017 at 5:43 AM, Hal Finkel via llvm-commits
>> <llvm-commits at lists.llvm.org
>> <mailto:llvm-commits at lists.llvm.org>> wrote:
>>
>> Hi Philip,
>>
>> First, I'm planning on rewriting this change to be more
>> targeted, specifically to take the alternative approach we
>> discussed briefly on the review thread and keep the list of
>> affected variables in the assumption cache.
>>
>>
>> Second, regarding motivation, the problem is not the
>> basic-block scan (with the assumption cache, we actually do
>> very little of that anyway). The core problem is that, in
>> ValueTracking, which is used heavily by InstCombine and other
>> passes, checks all assumptions in each function for every
>> variable being processed to determine known bits, etc. These
>> checks are pattern matches, and while they normally fail
>> quickly, as the number of assumptions increases, this is a
>> large problem: it makes known-bits determination
>> O(#assumptions in the function), for every variable, in the
>> common case, and, because the assumption processing is part
>> of the recursive known-bits procedure, the constant factor
>> can be large too given certain kinds of inputs. What we need
>> for ValueTracking (and also for LVI and maybe others) is a
>> way to cache which variables are (usefully) affected by which
>> assumptions.
>>
>>
>> When you say affected, do you mean "what uses"?
>> Because your operand bundles change, does not really do that.
>> For a given use I still have to find the assume (which is in the
>> middle of a block), then go to the bundle, then look at the uses
>> of the things in that bundle
>> Or am i missing something?
>
> You're missing something. The "affected uses" are added by
> InstCombine by looking for patterns useful to ValueTracking, etc.
> So, for example, if we have:
>
> %cmp1 = icmp eq i8 %a, 5
> call void @llvm.assume(i1 %cmp1)
>
> Then InstCombine would come along and add the operand bundle like
> this:
>
> call void @llvm.assume(i1 %cmp1) [ "affected"(i8 %a) ]
>
>
> Right, but this looks upwards, no?
Correct.
>
> So, if we're looking at %a, and want to know if there's some
> assumption that might be relevant to it, we just need to look at
> %a's users (instead of looking through all assumptions).
>
>
> Okay, so i still have to go and get the uses of a.
> Still better than it was, admittedly.
>
>
> It's certainly true that, as discussed below, once we find an
> assume in the users of %a, we need to verify the dominance
> relationship between the assume and %a, but this is still rare and
> has not yet presented itself as a problem as far as I can tell.
>
>
> Okay.
> It can be ... tricky, because for conditionals it's edge dominance,
> not block dominance. For assumes, you don't have this issue.
>
>
>
>
>
> The first problem is that checking whether some variable, %x, is
> potentially affected by an assume cannot be O(#assumes in the
> function).
>
>
> Right, i guess my definition of potentially affected was slightly
> different initially.
>
> IE given
>
> c = icmp eq a, b
>
> llvm.assume(c)
> use a
> use b
> use c
>
> I would have expected the use a, use b, and use c to be what you meant
> by "potentially affected", and changing it to:
>
> c = icmp eq a, b
>
> llvm.assume(c) ("%c")
> use a
> use b
> use c
>
> doesn't help find those.
>
> (whereas, in the predicateinfo scheme, you can add info for all 3 if
> you like)
Yes, it is. a, b, and c are all "affected", so we should have:
llvm.assume(c) ("%a, %b")
so the assume was a user of a, b, and c. It is equivalent to the
predicateinfo scheme in this respect.
>
> That's what ValueTracking, LVI, etc. do now and, for obvious
> reasons, causes problems as we make more-extensive use of assumes
> (or don't because of this performance problem).
>
>>
>> Regarding your suggestions below, I can definitely see how we
>> might run into a problem with the dominance checking if we
>> had a large number of assumptions which affected a variable
>> -- it is only when we have an assumption that potentially
>> affects a variable that we do any of the dominance checking
>> -- but I've not run into any such cases yet.
>>
>>
>> Note: I have precisely the same problem with newgvn's predicate
>> handling:
>>
>> I need to be able to tell, for a given instruction, what
>> predicates dominate it (including assumes)
>>
>> It's pretty fast for most predicates, because they occur at the
>> end of blocks, but could still be sped up.
>> For assumes, not so much, because they can occur wherever.
>>
>> I have a prototype where i've implemented e-ssa to solve both
>> problems. It requires the introduction of one intrinsic, and some
>> use renaming.
>
> Where do we introduce the intrinsic?
>
>
> Wherever we want. I'd say before the first CSE/CCP pass we currently
> have that uses assumes or conditional info.
>
> Do we maintain it throughout the pipeline?
>
>
> Yes, but this is easy.
>
> First, as a preliminary, screwups just cause missed optimizations, not
> correctness issues. It's not like memoryssa or even regular ssa,
> where updates are tricky, because the values you are adding are never
> distinct.
>
> There are three update cases:
>
> 1. You make a conditional branch unconditional - you can
>
> A. do nothing - in 99.99% of cases we do this, one of the blocks is
> dead and we remove it. Either the icmp will die, in which case, you
> can either replace the other operand of predicateinfo with undef (IE
> lazy cleanup), or we could kill it explicitly.
>
> Example:
>
> c =icmp eq a, b
> br c, label %1, label %2
>
> 1:
> c1 = predicateinfo(c)
> 2:
> c2 = predicateinfo(c)
> ->
> br label %2
> 2:
> c2 = predicateinfo(undef if c is dead, or just kill this thing)
>
>
> 2. You make an unconditional branch conditional -
> we pretty much don't do this, but if we do, you can add
> predicateinfos, in which case, you win
> or you can forget to, and we just miss an optimization.
>
> The update algorithm is pretty simple:
>
> If the two target blocks exist already, you add predicateinfo at the
> top, and rename affected uses. The total set of affected uses is just
> the set of uses of the original icmp, you can just go through them and
> see which ones belong to which branch using dominance (assuming you
> updated the dominator tree too). This has low cost.
>
> If the blocks don't exist, you can just rename them while building at
> basically no cost.
>
> Note that adding them does not strictly require phi nodes, because
> they do not produce distinct values The phi will always simplify to
> the original variable.
>
> It's like
> a = b
> if (<something>)
> a = b
>
> While we may transform this initially to
> a1 = b
> if (<something>)
> a2 = b
> phi(a1, a2)
>
> the phi is not *actually* necessary, and will be simplified to phi(b,
> b) and then removed.
>
> Now, there are passes that may want the phi node there to see the
> congruence (this is the same issue we have with "pruned" vs "not
> pruned" ssa already), but we could also just *always* insert a phi
> node, too, if that becomes an issue.
> This should require no extra computation, because in basically all
> cases you would have already had to compute the proper place for phi
> nodes for correctness of the*rest* of the IR you have changed, and the
> predicateinfo phis would not go in a different set of blocks.
>
> 3. You add a completely new branch - same as #2, because unconditional
> branches would not have had predicate info before.
>
>
> While i've given the above answers assuming you only change
>
> c = icmp eq a, b
> br c, %1, %2
> 1:
> use c
> 2:
> use c
> into
>
> c =icmp eq a, b
> br c, label %1, label %2
>
> 1:
> c1 = predicateinfo(c)
> use c1
> 2:
> c2 = predicateinfo(c)
> use c2
>
> you can also do
>
> c =icmp eq a, b
> br c, label %1, label %2
>
> 1:
> a1 = predicateinfo(a, c)
> b1 = predicateinfo(b, c)
> c1 = predicateinfo(c, c)
>
> 2:
> a2 = predicateinfo(a, c)
> b2 = predicateinfo(b, c)
> c2 = predicateinfo(c, c)
>
>
> You could also obviously prune this so you only do it if there are
> uses of the variable in the blocks dominated by either conditional
> edge, and then your only real explosion in variables tends to be in
> switches.
>
>> if (foo == 50) {
>> } else { }
>>
>> becomes
>>
>> if (foo == 50) {
>> foo1 = predicateinfo(foo, the icmp of foo )
>> (use foo1)
>> } else {
>> foo2 = predicateinfo(foo, the icmp of foo)
>> (use foo2)
>> }
>>
>> Once that's done, finding the set of predicates that usefully
>> affect a given use is trivial, as you can just walk the def-use
>> chains.
>>
>> I haven't implemented it for assume, but it should work just as
>> well for that.
>>
>> I'm somewhat against caching solutions and hacky here,I generally
>> prefer to solve infrastructury problems with better
>> infrastructure. Caching tends to just kick the can down the
>> road, and often makes it harder to build better infrastructure.
>> It makes sense when the change is significantly harder to get
>> right, and not worth the cost. Here, honestly, we can do better.
>> We have plenty of cases where CVP, etc, will be significantly
>> sped up by the above (i can hand them to you) but not by your change.
>>
>> Note also that it improves a large set of passes.
>> For example, the variable split enables SCCP to do correlated
>> value propagation (trivially for equality comparison, other
>> things with a bit of work)
>>
>> Before, it could only do something if "foo" turned out the same
>> everywhere.
>> After, you can propagate "foo == 50" to the foo1 branch.
>>
>> Early measurements come out to a few percent more constants found
>> total.
>
> How will we know when to add the predicateinfo intrinsic, or will
> we do it for all values used in the conditionally-executed region?
>
>
> All values used.
>
> There is a pruned form, or you could use heuristics, but i doubt it's
> worth it.
Makes sense.
Thanks again,
Hal
> This patch did the equivalent thing by pattern-matching the condition.
>
> Generally, I'm in favor of going in this direction.
>
> For assumes, we'd still need to pattern match, right? If so, this
> is still a cache in a sense ;) -- I suspect we'd still need the
> assumption cache too in order to efficiently compute ephemeral
> values (so that the inline cost analysis, etc. still have the
> right complexity bounds).
>
>
> Maybe.
>
> Thanks again,
> Hal
>
>>
>>
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
>
>
--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170108/51b46072/attachment.html>
More information about the llvm-commits
mailing list