[llvm] r289755 - Make processing @llvm.assume more efficient by using operand bundles

Mon Jan 2 22:20:14 PST 2017

Taking a step back, where is the performance hotspot we were trying to 
address with Hal's change? Is it specifically the local basic block walk 
case?  That would be my assumption, but I don't remember seeing any 
clear discussion pointing in that direction.

If so, I can see three possible paths forward here.

First, we can try to accelerate the basic block scan using caching.  We 
already have OrderedBasicBlock.  When walking over each assume in value 
tracking, lvi, etc.., we could easily cache the ordering within basic 
block to greatly reduce the cost.  It does look like we'd need to extend 
the abstraction slightly to provide a postdominates(inst, inst) 
function, but that can be implemented within the current single pass if 
needed.

We're also missing a couple of obvious optimizations where we can short 
circuit the walk (phis, terminators, etc...).  These are in 
DominatorTree, so this really only applies when we don't have dominator 
information available.

Second, we could canonicalize to avoid needing to scan the entire block 
in the common case.  Let's say that we eagerly hoisted assumes towards 
the beginning of their block.  If we did this in InstCombine, the rest 
of the optimizer could start at the beginning of the block.  We already 
have a lists of assumes which are in each basic block.  We'd know we 
were done scanning the basic block once we'd encountered the right 
number of assumes. We'd still have the problem of reordering being 
restricted by possible throw sites, but it'd likely improve the common 
case if nothing else.

Alternatively, we could try to sink towards the bottom of the block.  
The same general idea applies with the same general problems.  In 
principle, we could choose to sink assumes across throws anyways, but 
that would be giving up optimization potential.  We might be able to 
find some good heuristics though.

Third, can we just drop the single basic block case entirely?  Do we 
have any measurements for the performance we loose?  Or alternatively, 
what happens if we scan a small window of instructions around the 
assume?  Say six instructions on either side?

Philip

On 12/21/2016 08:19 AM, Davide Italiano wrote:
> On Mon, Dec 19, 2016 at 12:21 AM, Chandler Carruth via llvm-commits
> <llvm-commits at lists.llvm.org> wrote:
>> I spent some time looking at this.
>>
>> Hal, I think we need to revert all of this. =[
>>
>> The operand bundle thing solves a very real issue in InstCombine and
>> ValueTracking but it makes the inline cost computation superlinear. I'm
>> actually amazed we didn't see this sooner.
>>
> Sorry to deliver bad news, but, FYI, this also has a very negative
> impact on CVP/LVI, and it seems to be responsible for some of the
> regressions (but, unfortunately not all of them) I've reported in my
> llvm-dev thread (subject: llvm is getting slower).
> In particular, with this patch in `opt -O2` compile-time increases
> substantially.
>
>     2.8784 ( 11.4%)   0.1255 ( 19.6%)   3.0039 ( 11.6%)   2.9973 (
> 11.5%)  Value Propagation
> to
>    74.4273 ( 76.4%)   0.1218 ( 20.9%)  74.5491 ( 76.1%)  74.5410 (
> 76.1%)  Value Propagation
>
> And it's really a pity because this patch actually made InstCombine
> better in some cases.
> If you decide to reland this, feel free to cc: me and I'll try to do
> some more testing/profiling as needed (after the christmas break).
>
> Thanks!
>