[llvm] r289755 - Make processing @llvm.assume more efficient by using operand bundles
Hal Finkel via llvm-commits
llvm-commits at lists.llvm.org
Tue Jan 3 05:43:59 PST 2017
Hi Philip,
First, I'm planning on rewriting this change to be more targeted,
specifically to take the alternative approach we discussed briefly on
the review thread and keep the list of affected variables in the
assumption cache.
Second, regarding motivation, the problem is not the basic-block scan
(with the assumption cache, we actually do very little of that anyway).
The core problem is that, in ValueTracking, which is used heavily by
InstCombine and other passes, checks all assumptions in each function
for every variable being processed to determine known bits, etc. These
checks are pattern matches, and while they normally fail quickly, as the
number of assumptions increases, this is a large problem: it makes
known-bits determination O(#assumptions in the function), for every
variable, in the common case, and, because the assumption processing is
part of the recursive known-bits procedure, the constant factor can be
large too given certain kinds of inputs. What we need for ValueTracking
(and also for LVI and maybe others) is a way to cache which variables
are (usefully) affected by which assumptions.
Regarding your suggestions below, I can definitely see how we might run
into a problem with the dominance checking if we had a large number of
assumptions which affected a variable -- it is only when we have an
assumption that potentially affects a variable that we do any of the
dominance checking -- but I've not run into any such cases yet.
Thanks again,
Hal
On 01/03/2017 12:20 AM, Philip Reames wrote:
> Taking a step back, where is the performance hotspot we were trying to
> address with Hal's change? Is it specifically the local basic block
> walk case? That would be my assumption, but I don't remember seeing
> any clear discussion pointing in that direction.
>
> If so, I can see three possible paths forward here.
>
> First, we can try to accelerate the basic block scan using caching.
> We already have OrderedBasicBlock. When walking over each assume in
> value tracking, lvi, etc.., we could easily cache the ordering within
> basic block to greatly reduce the cost. It does look like we'd need
> to extend the abstraction slightly to provide a postdominates(inst,
> inst) function, but that can be implemented within the current single
> pass if needed.
>
> We're also missing a couple of obvious optimizations where we can
> short circuit the walk (phis, terminators, etc...). These are in
> DominatorTree, so this really only applies when we don't have
> dominator information available.
>
> Second, we could canonicalize to avoid needing to scan the entire
> block in the common case. Let's say that we eagerly hoisted assumes
> towards the beginning of their block. If we did this in InstCombine,
> the rest of the optimizer could start at the beginning of the block.
> We already have a lists of assumes which are in each basic block.
> We'd know we were done scanning the basic block once we'd encountered
> the right number of assumes. We'd still have the problem of reordering
> being restricted by possible throw sites, but it'd likely improve the
> common case if nothing else.
>
> Alternatively, we could try to sink towards the bottom of the block.
> The same general idea applies with the same general problems. In
> principle, we could choose to sink assumes across throws anyways, but
> that would be giving up optimization potential. We might be able to
> find some good heuristics though.
>
> Third, can we just drop the single basic block case entirely? Do we
> have any measurements for the performance we loose? Or alternatively,
> what happens if we scan a small window of instructions around the
> assume? Say six instructions on either side?
>
> Philip
>
> On 12/21/2016 08:19 AM, Davide Italiano wrote:
>> On Mon, Dec 19, 2016 at 12:21 AM, Chandler Carruth via llvm-commits
>> <llvm-commits at lists.llvm.org> wrote:
>>> I spent some time looking at this.
>>>
>>> Hal, I think we need to revert all of this. =[
>>>
>>> The operand bundle thing solves a very real issue in InstCombine and
>>> ValueTracking but it makes the inline cost computation superlinear. I'm
>>> actually amazed we didn't see this sooner.
>>>
>> Sorry to deliver bad news, but, FYI, this also has a very negative
>> impact on CVP/LVI, and it seems to be responsible for some of the
>> regressions (but, unfortunately not all of them) I've reported in my
>> llvm-dev thread (subject: llvm is getting slower).
>> In particular, with this patch in `opt -O2` compile-time increases
>> substantially.
>>
>> 2.8784 ( 11.4%) 0.1255 ( 19.6%) 3.0039 ( 11.6%) 2.9973 (
>> 11.5%) Value Propagation
>> to
>> 74.4273 ( 76.4%) 0.1218 ( 20.9%) 74.5491 ( 76.1%) 74.5410 (
>> 76.1%) Value Propagation
>>
>> And it's really a pity because this patch actually made InstCombine
>> better in some cases.
>> If you decide to reland this, feel free to cc: me and I'll try to do
>> some more testing/profiling as needed (after the christmas break).
>>
>> Thanks!
>>
>
--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
More information about the llvm-commits
mailing list