[llvm-dev] more reassociation in IR

Hiroshi Yamauchi via llvm-dev llvm-dev at lists.llvm.org
Fri May 25 09:22:16 PDT 2018


Taking an approach like ​D41574 and/or D45842 would mean modifying the
existing reassociate pass, adding some new reassociate-like passes, etc.

I think if there's any concern/objection to it as a general direction in
the short term, it'd be good to discuss here. But there doesn't seem to be?

On Fri, May 18, 2018 at 2:31 PM Sanjay Patel <spatel at rotateright.com> wrote:

> I mentioned this earlier in the thread - I would like to see something
> like D41574 in the optimizer. It's optimizing code that no other pass does
> currently, and I don't see any other near-term proposal that gets us those
> optimizations.
>

> Omer, can you rebase that to trunk? I think a header has moved, so it
> doesn't build as-is. I'd like to know if it can catch the cases in D45842.
> If not, why not? If it can handle those easily, I'll abandon D45842.
>
> Also, I don't know if it's better to include that functionality as another
> iteration of the existing -reassociate or split it off as its own pass. But
> I think it should do the distributive simplifications that are currently in
> -instcombine (InstCombiner::SimplifyUsingDistributiveLaws). Using that
> instsimplify logic for analysis lets us decide if the reassociation is
> worthwhile in the 1st place, it removes the risk that some other pass would
> somehow mess up the pattern before instcombine could zap it, and it reduces
> the burden on instcombine to be the entire optimizer. :)
>
>
> On Mon, May 14, 2018 at 1:34 PM, Hiroshi Yamauchi via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>>
>>
>> On Fri, May 11, 2018 at 7:20 PM Hal Finkel <hfinkel at anl.gov> wrote:
>>
>>>
>>> On 05/11/2018 08:40 PM, Daniel Berlin via llvm-dev wrote:
>>>
>>>
>>>
>>> On Fri, May 11, 2018 at 2:37 PM, Hiroshi Yamauchi <yamauchi at google.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Thu, May 10, 2018 at 12:49 PM Daniel Berlin <dberlin at dberlin.org>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Thu, May 10, 2018 at 12:05 PM, Hiroshi Yamauchi <
>>>>> yamauchi at google.com> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, May 9, 2018 at 8:24 PM Daniel Berlin <dberlin at dberlin.org>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, May 9, 2018 at 10:39 AM, Hiroshi Yamauchi <
>>>>>>> yamauchi at google.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, May 8, 2018 at 11:15 AM Daniel Berlin <dberlin at dberlin.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, May 8, 2018 at 10:38 AM, Hiroshi Yamauchi via llvm-dev <
>>>>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>>>>
>>>>>>>>>> (
>>>>>>>>>> ​I came across this issue in the context of
>>>>>>>>>>  D46336 <https://reviews.llvm.org/D46336>.
>>>>>>>>>> ​ ​
>>>>>>>>>> Thanks, Sanjay, for starting this discussion.)
>>>>>>>>>>
>>>>>>>>>> If
>>>>>>>>>> ​we will
>>>>>>>>>>  move
>>>>>>>>>> ​reassociation,
>>>>>>>>>> or keep additional ones
>>>>>>>>>> ​,​
>>>>>>>>>> out of instcombine,
>>>>>>>>>> ​open questions for me would be
>>>>>>>>>> ​​:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 1. Since -reassociate isn't a fixed point pass,
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This is fixable, fwiw, without fixpointing it.
>>>>>>>>>
>>>>>>>>
>>>>>>>> How?
>>>>>>>>
>>>>>>>
>>>>>>> Depends on specifically which part you would like to know about ;)
>>>>>>>
>>>>>>
>>>>>> ​Maybe I misunderstood what you meant by "This is fixable". Did you
>>>>>> mean that we won't somehow need to fixpoint between instcombine and
>>>>>> reassociate, or that the specific motivating examples from the above
>>>>>> differentials are foldable without fixpointing?
>>>>>>
>>>>>
>>>>> If by fixpointing you mean "fixpointing reassociate and instcombine",
>>>>> then yes, that is fixable without fixpointing reassociate and instcombine,
>>>>> but would require rewriting instcombine :)
>>>>>
>>>>>
>>>>>>
>>>>>> If the latter, that may be the case. The concern was that we may
>>>>>> encounter examples that may need many more iterations, if not fixpointing.
>>>>>> As long as it's feasible to fixpoint between instcombine and
>>>>>> reassociate, it seems to work, but I guess that would probably need
>>>>>> some pass management change.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> we might need to repeat "-instcombine -reassociate" multiple
>>>>>>>>>> times to
>>>>>>>>>> ​fold
>>>>>>>>>>  down to what we want (relating to my comment here
>>>>>>>>>> <https://reviews.llvm.org/D46336#1087082>). I assumed this isn't
>>>>>>>>>> not what we want to do
>>>>>>>>>> ​? My impression is we don't do a fixed-point with passes?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Well, i mean there is no practical difference between passes that
>>>>>>>>> we fixpoint externally and fixpoint internally.
>>>>>>>>>
>>>>>>>> ​​
>>>>>>>> I had the following in mind: Does the pass manager support
>>>>>>>> fixpointing externally? Is there any performance difference? Are people
>>>>>>>> okay with that in general?
>>>>>>>>
>>>>>>>> But if there is no practical difference, I don't see any problem
>>>>>>>> with that :)
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> 2.
>>>>>>>>>> ​Since -reassociate needs to come up with one operand order (at
>>>>>>>>>> least currently as the only reassociate pass), would there exist
>>>>>>>>>> a single, unique operand order that would enable all
>>>>>>>>>> reassociative/commutative foldings that we want?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> In what way?
>>>>>>>>> Are you asking whether there is a single reassociation order that
>>>>>>>>> makes all foldings occur in the same operation or something?
>>>>>>>>> I don't feel like i understand what you are asking.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Does this rephrase help: with the motivating examples (like
>>>>>>>> and-of-shifts or bit check patterns) from the above differentials in mind,
>>>>>>>> can we come up with a single reassociation order that solves all
>>>>>>>> those and all the others that may come up in the future? Would we need
>>>>>>>> different reassociation orders to fold different patterns?
>>>>>>>>
>>>>>>>
>>>>>>> It doesn't quite help.
>>>>>>> When stated that generally, there can be no such ordering at all,
>>>>>>> that's easy to prove.  It is a statically undecidable problem.
>>>>>>>
>>>>>>> There is however, a different question and answer to a few related
>>>>>>> problems that maybe you are really asking?
>>>>>>> 1. Is there a way to determine and apply the a maximal or
>>>>>>> nearly-maximal set of folds/graph transforms that could be applied to a
>>>>>>> given set of code in a sane and principled way -> yes
>>>>>>>
>>>>>>> (see, e.g., http://www.cs.cornell.edu/~ross/publications/eqsat/)
>>>>>>>
>>>>>>> 2. Is there a way to determine all expressions in the program as it
>>>>>>> exists that are equivalent or equivalent under constant time constant
>>>>>>> folding/reassociation, in a reasonable time bound -> yes
>>>>>>>
>>>>>>> (not a single easy link, happy to talk about it)
>>>>>>>
>>>>>>> Your original question is basically equivalent to
>>>>>>> Is there a way to determine all expressions in the program as it
>>>>>>> exists that are equivalent or could be made equivalent through any type of
>>>>>>> folding that one can think up?
>>>>>>> The answer to that is "no", it's provable that this is not
>>>>>>> statically decidable, so the time bound doesn't matter :)
>>>>>>>
>>>>>>> You have to limit the possible folding/evaluation you apply in
>>>>>>> various ways to make this decidable, and then further limit it to make the
>>>>>>> time bound reasonable.
>>>>>>>
>>>>>>> This all quickly devolves into herbrand equivalence and it's
>>>>>>> variations.
>>>>>>>
>>>>>>>
>>>>>>>>>>>>> Let me try one more time :) May we need multiple reassociate passes
>>>>>> to fold different reassociative patterns?
>>>>>>
>>>>>> A longer version: If Sanjay wants a particular reassociative pattern
>>>>>> to be folded (D45842), Omer wants another particular reassociative
>>>>>> pattern to be folded (D41574), and I want yet another ​particular reassociative
>>>>>> pattern to be folded (D46336), would we potentially need three
>>>>>> different reassociate passes with each combined with instcombine, rather
>>>>>> than just one that may be able to somehow handle those cases in one
>>>>>> shot, (assuming we don't want to put those in instcombine)?
>>>>>>
>>>>>> And it sounds like the answer is yes?
>>>>>>
>>>>>
>>>>> If you take the current instcombine as a base, then yes, that is
>>>>> correct.
>>>>>
>>>> ​​
>>>>> If you are willing to rearchitect instcombine, the answer is no, it's
>>>>> possible to do this all in a single pass in a relatively sane way.
>>>>>
>>>>
>>>> I assume by rearchitect, you mean a major rewrite as per this comment:
>>>> "Is there a way to determine all expressions in the program as it
>>>> exists that are equivalent or equivalent under constant time constant
>>>> folding/reassociation, in a reasonable time bound -> yes". Any pointer
>>>> or time to chat?
>>>>
>>>
>>> I'm happy to do both.
>>>
>>>
>>>>>>>> I think that an approach like ​
>>>> D
>>>> ​​
>>>> ​​
>>>> 4
>>>> ​​
>>>> ​​
>>>> 6
>>>> ​​
>>>> ​​
>>>> 3
>>>> ​​
>>>> ​​
>>>> 3
>>>> ​​
>>>> ​​
>>>> 6
>>>> ​​
>>>> ​​
>>>> ​ /
>>>> ​​
>>>> D
>>>> ​​
>>>> ​​
>>>> 4
>>>> ​​
>>>> ​​
>>>> 6
>>>> ​​
>>>> ​​
>>>> 5
>>>> ​​
>>>> ​​
>>>> 9
>>>> ​​
>>>> ​​
>>>> 5
>>>> ​​
>>>> ​​ has a merit: it would adds a bit of complexity, but would not
>>>>  require:
>>>>
>>>> 1. a major rewrite of instcombine,
>>>> 2. writing multiple (potentially many) reassociate passes and figuring
>>>> out how to fixpoint them with instcombine, or
>>>> 3. writing a self-contained folding pass for a specific pattern
>>>>
>>>> If you look at the diffs in the existing .ll files in
>>>> D46336
>>>> ​, it helps fold some previously-unfolded reassociation patterns
>>>> beyond the bit check patterns that it originally targeted.
>>>>
>>>> Sure, and it does so by  adding another O(N) cost to evaluation in each
>>> case. Instcombine doesn't even do lazy reevaluation through tracking
>>> dependencies, so it'll do so a lot of times as well.
>>>
>>> To me, that's not a good tradeoff, especially given how slow instcombine
>>> is *already*.  The code it produces is "good enough" to stop for a while
>>> and do something else and not suffer horribly in performance.[1]
>>>
>>> Let me ask a different question:
>>>
>>> At what point would anyone here be willing to stop adding things to
>>> instcombine and start doing something else instead, instead of waiting for
>>> someone else to do it?
>>> As far as i can tell, the answer is: "never", which makes most of these
>>> discussions just pointless rehashes as we slowly repeat the same disaster
>>> that became  gcc's instruction combiner  :)
>>>
>>> If the answer is "something", great, i'll set a mail filter and ignore
>>> these threads until that something happens :)
>>>
>>> Personally, in my experience people will never do more here unless
>>> pushed somewhat, or the thing becomes such a complete disaster no one wants
>>> to touch it.
>>>
>>>
>>> I've said this before, but I think a major impediment to forward
>>> progress here is coming up with an agreement on what the "something else"
>>> should be. Some of us have talked for years about having some
>>> TableGen-driven replacement, or maybe we want something with a syntax more
>>> like what is used by the Alive tool, but regardless, in order to gain in
>>> efficiency I suspect we need a model that is more restrictive than
>>> more-or-less arbitrary C++ code, and so we should pick a model and figure
>>> out how things might work.
>>>
>>>
>>> [1]  Last year i computed the "improvement in performance on
>>> applications" due to instcombine for a bunch of google apps and open source
>>> apps that had easy to use benchmarks (IE I isolated about two years of
>>> instcombine changes and made them to a current compiler piece by piece
>>> while measuring performance).
>>> I also computed the compile time increase in single instcombine passes
>>> over the same time period.
>>>
>>> On x86, but the numbers basically said we were basically gaining nearly
>>> nothing for high cost.  IE our drive for better looking output does not
>>> appear to translate into any real gains that i can find.  Either
>>> improvements to other opts hid them, or they simply didn't matter on the
>>> processors i tested on.
>>>
>>> Certainly, apps/workloads/architectures may vary here, and my goal is
>>> not to claim it's all worthless.
>>> My actual goal in all of this was to get a sense of whether my
>>> perspective on instcombine was still "reasonable", not to do a true
>>> scientific exploration :)
>>> I didn't have time/energy/etc to run it elsewhere, and again, my goal
>>> was not to give certainty/try to give exact percentages.
>>>
>>>
>>> This also matches my experience, but I draw a somewhat different lesson.
>>> I often tell application developers that *this* is why they must file
>>> compiler bug reports. Waiting and assuming that someone else will hit the
>>> same problem, and file the report, is a bad strategy. I think that this is
>>> due to two things:
>>>
>>>  1. As far as things go, the tail of the distribution is often really
>>> long, and probability that the particular thing hampering one piece of hot
>>> code is the same thing hampering another piece of hot code is often small.
>>>
>>>  2. We tend to add special cases instead of adding more-general
>>> algorithms. The more-general work is often hard because figuring out the
>>> cost modeling is often highly non-trivial. Also, when it's finally done,
>>> the chances that the old special cases are removed is also small (so we'll
>>> still accumulate cruft without specific effort).
>>>
>>>
>> ​I don't have better or ​larger study results or data, but for fixing
>> some smaller but important enough performance degradations (often in
>> microbenchmarks, but occasionally in larger settings) smaller compiler
>> improvements (not necessarily in instcombine but around that level) do
>> seem to matter at least in individual contexts. It's not clear how much
>> those actually mattered in a grander scheme of things, though.
>>
>> It sounds like we don't want to add to instcombine due to added
>> cost/complexity, rearchitecting would be hard, it's not clear if
>> incremental changes in instcombine did much per cost, which would make it
>> harder to justify rearchitecting it...
>>
>> What do people think of approaches like D41574 and D45842?
>>
>>
>>
>>>  -Hal
>>>
>>>
>>> --Dan
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing listllvm-dev at lists.llvm.orghttp://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>> --
>>> Hal Finkel
>>> Lead, Compiler Technology and Programming Languages
>>> Leadership Computing Facility
>>> Argonne National Laboratory
>>>
>>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180525/fbc567a6/attachment-0001.html>


More information about the llvm-dev mailing list