[llvm-dev] [llvm] r303387 - [InstCombine] add more tests for xor-of-icmps; NFC

Fri May 19 16:11:31 PDT 2017

[adding llvm-dev for wider audience]

On Fri, May 19, 2017 at 12:28 PM, Daniel Berlin <dberlin at dberlin.org> wrote:

>
>
> On Fri, May 19, 2017 at 11:00 AM, Davide Italiano <davide at freebsd.org>
> wrote:
>
>> On Fri, May 19, 2017 at 10:00 AM, Sanjay Patel <spatel at rotateright.com>
>> wrote:
>> > Is "VRP" (value range propagation) in LLVM-speak "CVP" (correlated value
>> > propagation)?
>> >
>> > If so, we have this comment regarding compares:
>> >   // As a policy choice, we choose not to waste compile time on anything
>> > where
>> >   // the comparison is testing local values.
>> >
>> > Or this for div/rem:
>> >   // As a policy choice, we choose not
>> >   // to waste compile time on anything where the operands are local
>> defs.
>> >
>> > "Local" means in the same basic block from what I can tell by the code
>> here.
>> >
>> > I think this means that this pass explicitly defers handling simple
>> cases
>> > like:
>> > https://reviews.llvm.org/D33342
>> > ...to another pass, and currently that pass is InstCombine (because the
>> > patterns really can't be any simpler than the tests in that patch,
>> right?).
>> >
>> > I think that improving compile-time should not preclude improving
>> > InstCombine. We should do both.
>> >
>>
>> Just thoughts, feel free to ignore them.
>> I didn't measure the impact in LLVM, but I'm sure you can do VRP
>> relatively fast (GCC does that both interprocedurally and
>> intraprocedurally and their pass is much faster in some cases than
>> LLVM's), i.e. O(N) in practice, so, maybe we could re-evaluate this
>> policy?
>>
>
> Yeah, that's kind of silly, we can do much better.
>
>
>> I think replacing a pass in LLVM is not trivial (I'm learning it the
>> hard way with NewGVN). OTOH, I'm still not entirely convinced
>> `InstCombine` should handle these cases given it's already a
>> compile-time sink?
>>
>
>
Correct me where I'm going wrong.

1. We got a bug report with a perf regression that was root caused to this
IR:

  define i1 @xor_of_icmps(i64 %a) {
    %b = icmp sgt i64 %a, 0
    %c = icmp eq i64 %a, 1
    %r = xor i1 %b, %c
    ret i1 %r
  }

Because this was a regression, we know that the optimal/canonical IR for
this program is:

  define i1 @xor_of_icmps_canonical(i64 %a) {
    %r = icmp sgt i64 %a, 1
     ret i1 %r
  }

2. I saw this as a bug in InstCombine because I think InstCombine's job is
to canonicalize simple IR sequences.

3. I assumed that matching an (xor icmp, icmp) pattern can be done
efficiently in InstCombine. In fact, I knew that we already did that match,
so there is zero incremental cost to find this pattern in InstCombine.

4. I knew that we already handle related and-of-cmps and or-of-cmps
patterns in InstSimplify/InstCombine.

5. Based on that, I have proposed a patch that mostly uses existing logic
in those passes to solve this bug in the most general way I am aware of. It
makes 2 calls to InstSimplify to find a potential fold before
morphing/creating new instructions that may get folded by other existing
logic.

Questions:
1. Was/is InstCombine intended to handle this transform?

2. If there is a measurable compile-time cost for this patch, then there
must be a structural problem with InstSimplify, InstCombine, and/or the
pass pipeline?

3. Is there another pass that is more suitable to solve this bug than
InstCombine? CVP was mentioned as a candidate for enhancement. Any others?

4. If there is a more suitable pass, why is it more suitable?

5. If we add to that pass or create a new pass that specifically targets
logic-of-cmps, can we pull that code out of InstSimplify and InstCombine?

6. If compile-time has become unbearable, shouldn't we be moving
optimization logic and/or passes from -O1 to -O2 to -O3, etc? My
understanding as a compiler consumer has always been that I'll get better
performing code the higher up the -O# ladder I climb, but it will cost
increasingly more time to compile. Is that not the model for LLVM?

People seem intent on adding lots and lots of stuff to instcombine because
> it's easy.
> This makes it harder to ever replace, while simultaneously making it
> slower.
> It's not hard to see where that path ends up.
> InstCombine does a lot of random crap that isn't even combining or graph
> rewrite, too, because well second and third order effects are hard.
>

Can you provide some examples? I honestly don't know where I should draw
the line. If I've crossed the line, I'd like to fix it.

If we want an optimal graph rewriter, we should actually just build one,
> with sane rewrite rules, verification that it fixpoints, and some sense of
> good ordering, etc,  instead of  slowly every adding possible pattern match
> to InstCombine.
>

Can you recommend papers or projects to look at to get a better
understanding of an optimal graph rewriter? LLVM is the only compiler I've
worked on from the inside, so I don't really know what else is possible /
feasible.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170519/b8d97a5e/attachment.html>