[llvm] r289755 - Make processing @llvm.assume more efficient by using operand bundles

Tue Jan 3 05:43:59 PST 2017

Hi Philip,

First, I'm planning on rewriting this change to be more targeted, 
specifically to take the alternative approach we discussed briefly on 
the review thread and keep the list of affected variables in the 
assumption cache.

Second, regarding motivation, the problem is not the basic-block scan 
(with the assumption cache, we actually do very little of that anyway). 
The core problem is that, in ValueTracking, which is used heavily by 
InstCombine and other passes, checks all assumptions in each function 
for every variable being processed to determine known bits, etc. These 
checks are pattern matches, and while they normally fail quickly, as the 
number of assumptions increases, this is a large problem: it makes 
known-bits determination O(#assumptions in the function), for every 
variable, in the common case, and, because the assumption processing is 
part of the recursive known-bits procedure, the constant factor can be 
large too given certain kinds of inputs. What we need for ValueTracking 
(and also for LVI and maybe others) is a way to cache which variables 
are (usefully) affected by which assumptions.

Regarding your suggestions below, I can definitely see how we might run 
into a problem with the dominance checking if we had a large number of 
assumptions which affected a variable -- it is only when we have an 
assumption that potentially affects a variable that we do any of the 
dominance checking -- but I've not run into any such cases yet.

Thanks again,

Hal

On 01/03/2017 12:20 AM, Philip Reames wrote:
> Taking a step back, where is the performance hotspot we were trying to 
> address with Hal's change? Is it specifically the local basic block 
> walk case?  That would be my assumption, but I don't remember seeing 
> any clear discussion pointing in that direction.
>
> If so, I can see three possible paths forward here.
>
> First, we can try to accelerate the basic block scan using caching.  
> We already have OrderedBasicBlock.  When walking over each assume in 
> value tracking, lvi, etc.., we could easily cache the ordering within 
> basic block to greatly reduce the cost.  It does look like we'd need 
> to extend the abstraction slightly to provide a postdominates(inst, 
> inst) function, but that can be implemented within the current single 
> pass if needed.
>
> We're also missing a couple of obvious optimizations where we can 
> short circuit the walk (phis, terminators, etc...).  These are in 
> DominatorTree, so this really only applies when we don't have 
> dominator information available.
>
> Second, we could canonicalize to avoid needing to scan the entire 
> block in the common case.  Let's say that we eagerly hoisted assumes 
> towards the beginning of their block.  If we did this in InstCombine, 
> the rest of the optimizer could start at the beginning of the block.  
> We already have a lists of assumes which are in each basic block.  
> We'd know we were done scanning the basic block once we'd encountered 
> the right number of assumes. We'd still have the problem of reordering 
> being restricted by possible throw sites, but it'd likely improve the 
> common case if nothing else.
>
> Alternatively, we could try to sink towards the bottom of the block.  
> The same general idea applies with the same general problems.  In 
> principle, we could choose to sink assumes across throws anyways, but 
> that would be giving up optimization potential.  We might be able to 
> find some good heuristics though.
>
> Third, can we just drop the single basic block case entirely?  Do we 
> have any measurements for the performance we loose?  Or alternatively, 
> what happens if we scan a small window of instructions around the 
> assume?  Say six instructions on either side?
>
> Philip
>
> On 12/21/2016 08:19 AM, Davide Italiano wrote:
>> On Mon, Dec 19, 2016 at 12:21 AM, Chandler Carruth via llvm-commits
>> <llvm-commits at lists.llvm.org> wrote:
>>> I spent some time looking at this.
>>>
>>> Hal, I think we need to revert all of this. =[
>>>
>>> The operand bundle thing solves a very real issue in InstCombine and
>>> ValueTracking but it makes the inline cost computation superlinear. I'm
>>> actually amazed we didn't see this sooner.
>>>
>> Sorry to deliver bad news, but, FYI, this also has a very negative
>> impact on CVP/LVI, and it seems to be responsible for some of the
>> regressions (but, unfortunately not all of them) I've reported in my
>> llvm-dev thread (subject: llvm is getting slower).
>> In particular, with this patch in `opt -O2` compile-time increases
>> substantially.
>>
>>     2.8784 ( 11.4%)   0.1255 ( 19.6%)   3.0039 ( 11.6%)   2.9973 (
>> 11.5%)  Value Propagation
>> to
>>    74.4273 ( 76.4%)   0.1218 ( 20.9%)  74.5491 ( 76.1%)  74.5410 (
>> 76.1%)  Value Propagation
>>
>> And it's really a pity because this patch actually made InstCombine
>> better in some cases.
>> If you decide to reland this, feel free to cc: me and I'll try to do
>> some more testing/profiling as needed (after the christmas break).
>>
>> Thanks!
>>
>

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory