[llvm-dev] InstCombine, graph rewriting, Equality saturation

Mon Sep 11 13:32:38 PDT 2017

On Mon, Sep 11, 2017 at 9:48 AM, Hal Finkel <hfinkel at anl.gov> wrote:

>
> On 09/11/2017 10:50 AM, Sean Silva via llvm-dev wrote:
>
>
>
> On Mon, Sep 11, 2017 at 8:14 AM, Daniel Neilson via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>>
>> Just thinking out loud…. I’m really not familiar with the vast majority
>> of what instcombine does, so please excuse me if my naiveté is too obvious.
>> I can’t help but notice all of the uses of ‘and’ in Daniel B’s description
>> of what instcombine is right now:
>>
>> > On Sep 8, 2017, at 11:27 PM, Daniel Berlin via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>> >
>> > FWIW, Before getting to "how to do it", I think InstCombine needs an
>> actual goal.
>> > Right now it's "do a bunch of instruction combination and
>> canonicalization and random kinds of semi-local optimization with some
>> weird analysis and other stuff in the middle.
>> > Iterate this as hard as we can"
>> > Nobody can really draw any real dividing line for what should be there
>> or not except based on how easy or expensive it is to shove it in.
>> > That's a recipe for pass creep.
>>
>> This makes me wonder… is it sensible to be talking about splitting up
>> instcombine into multiple separate passes? Would such a thing even be
>> possible? For example, split by functionality into separate passes that
>> each do one of:
>> * instruction combinations
>> * canonicalization
>> * semi-local optimizations
>> * etc…
>>
>
> The problem is that all of these are really the same thing. Almost all
> canonicalizations are also simplifications, the common underlying factor
> being that they're mostly-local transformations that likely enable other
> optimizations.
>
>
>>  Or something like that.
>>
>>  As separate passes, each would probably have a natural way to be
>> implemented effectively and those implementations might vary.
>>
>
> One obstacle to that is that currently instcombine has an internal
> fixed-point iteration that it does.
>
> So when splitting it into separate passes we would need to either add
> fixed-point iteration to the pass manager running the separate instcombine
> passes (extending the pass management in this way is doable and has been
> explored in the past, e.g. https://www.youtube.com/watch?v=c7iP43an5_Q )
> or demonstrate/measure that we don't regress without the fixed-point
> iteration across separate instcombine passes.
>
>
> I agree, but I'll add that I view the fixed-point iteration here to be an
> asset. It increases the robustness and consistency of the compiler, and
> allows later passes to depend on the canonicalization (*) without worrying
> about phase-ordering effects (**).
>
> (*) In my experience, the largest problem we have in this regard is our
> lack of documentation on what our canonical form is.
> (**) To be clear, we still have phase-ordering effects within InstCombine.
> Using a better system for dealing with the rewriting rules, I hope, will
> help. Nevertheless, these effects are much rarer than if we ran InstCombine
> only as discrete passe.
>
> I'd like to see us do more fixed-point iteration in our optimization
> pipeline, and our past work has shown this would be practical (i.e., only a
> few iterations will be needed in practice), but even that won't remove the
> advantages of using a fixed-point iteration within InstCombine.
>
> Regardless, I think that if we had a model for what InstCombine did (i.e.,
> as a graph-rewriting system), then it would be clear what could be part of
> that system and what couldn't. Otherwise, I think that the real challenge
> here is figuring out the underlying structural foundations to make the
> process efficient and sound.
>

As a data point for this, I did a quick walkthrough of the top-level of
instcombine, and one thing that stuck out to me is that it tries to sink
instructions in certain "easy" scenarios. That doesn't fit a graph
rewriting system.

As a side note, TryToSinkInstruction does a whole-BB scan to check safety
for sinking loads, which will go quadratic (in the number of loads) on an
input like:

```
void use(int);
void foo(int *p, int *q, bool b) {
int p0 = p[0];
int p1 = p[1];
int p2 = p[2];
int p3 = p[3];
if (b) {
use(p0);
use(p1);
use(p2);
use(p3);
}
}
```
(script attached)

The commit where it was introduced (https://reviews.llvm.org/rL18677 /
https://reviews.llvm.org/rL18692 yes that's from 2004) indicated that it
actually caught a bunch of cases in SPEC, but it isn't clear that it's
really providing much value integrated into the fixed-point iteration
compared to an appropriate standalone global code motion pass (which would
not have the quadratic complexity and would be more general to boot).

-- Sean Silva

>
>
>  -Hal
>
>
> -- Sean Silva
>
>
>>
>> -Daniel
>>
>> ---
>> Daniel Neilson, Ph.D.
>> Azul Systems
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
>
>
> _______________________________________________
> LLVM Developers mailing listllvm-dev at lists.llvm.orghttp://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170911/8f13fe0e/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: instcombine_quadratic_load_sinking.py
Type: text/x-python
Size: 477 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170911/8f13fe0e/attachment.py>