[llvm] r303387 - [InstCombine] add more tests for xor-of-icmps; NFC
Sanjay Patel via llvm-commits
llvm-commits at lists.llvm.org
Fri May 19 16:11:31 PDT 2017
[adding llvm-dev for wider audience]
On Fri, May 19, 2017 at 12:28 PM, Daniel Berlin <dberlin at dberlin.org> wrote:
>
>
> On Fri, May 19, 2017 at 11:00 AM, Davide Italiano <davide at freebsd.org>
> wrote:
>
>> On Fri, May 19, 2017 at 10:00 AM, Sanjay Patel <spatel at rotateright.com>
>> wrote:
>> > Is "VRP" (value range propagation) in LLVM-speak "CVP" (correlated value
>> > propagation)?
>> >
>> > If so, we have this comment regarding compares:
>> > // As a policy choice, we choose not to waste compile time on anything
>> > where
>> > // the comparison is testing local values.
>> >
>> > Or this for div/rem:
>> > // As a policy choice, we choose not
>> > // to waste compile time on anything where the operands are local
>> defs.
>> >
>> > "Local" means in the same basic block from what I can tell by the code
>> here.
>> >
>> > I think this means that this pass explicitly defers handling simple
>> cases
>> > like:
>> > https://reviews.llvm.org/D33342
>> > ...to another pass, and currently that pass is InstCombine (because the
>> > patterns really can't be any simpler than the tests in that patch,
>> right?).
>> >
>> > I think that improving compile-time should not preclude improving
>> > InstCombine. We should do both.
>> >
>>
>> Just thoughts, feel free to ignore them.
>> I didn't measure the impact in LLVM, but I'm sure you can do VRP
>> relatively fast (GCC does that both interprocedurally and
>> intraprocedurally and their pass is much faster in some cases than
>> LLVM's), i.e. O(N) in practice, so, maybe we could re-evaluate this
>> policy?
>>
>
> Yeah, that's kind of silly, we can do much better.
>
>
>> I think replacing a pass in LLVM is not trivial (I'm learning it the
>> hard way with NewGVN). OTOH, I'm still not entirely convinced
>> `InstCombine` should handle these cases given it's already a
>> compile-time sink?
>>
>
>
Correct me where I'm going wrong.
1. We got a bug report with a perf regression that was root caused to this
IR:
define i1 @xor_of_icmps(i64 %a) {
%b = icmp sgt i64 %a, 0
%c = icmp eq i64 %a, 1
%r = xor i1 %b, %c
ret i1 %r
}
Because this was a regression, we know that the optimal/canonical IR for
this program is:
define i1 @xor_of_icmps_canonical(i64 %a) {
%r = icmp sgt i64 %a, 1
ret i1 %r
}
2. I saw this as a bug in InstCombine because I think InstCombine's job is
to canonicalize simple IR sequences.
3. I assumed that matching an (xor icmp, icmp) pattern can be done
efficiently in InstCombine. In fact, I knew that we already did that match,
so there is zero incremental cost to find this pattern in InstCombine.
4. I knew that we already handle related and-of-cmps and or-of-cmps
patterns in InstSimplify/InstCombine.
5. Based on that, I have proposed a patch that mostly uses existing logic
in those passes to solve this bug in the most general way I am aware of. It
makes 2 calls to InstSimplify to find a potential fold before
morphing/creating new instructions that may get folded by other existing
logic.
Questions:
1. Was/is InstCombine intended to handle this transform?
2. If there is a measurable compile-time cost for this patch, then there
must be a structural problem with InstSimplify, InstCombine, and/or the
pass pipeline?
3. Is there another pass that is more suitable to solve this bug than
InstCombine? CVP was mentioned as a candidate for enhancement. Any others?
4. If there is a more suitable pass, why is it more suitable?
5. If we add to that pass or create a new pass that specifically targets
logic-of-cmps, can we pull that code out of InstSimplify and InstCombine?
6. If compile-time has become unbearable, shouldn't we be moving
optimization logic and/or passes from -O1 to -O2 to -O3, etc? My
understanding as a compiler consumer has always been that I'll get better
performing code the higher up the -O# ladder I climb, but it will cost
increasingly more time to compile. Is that not the model for LLVM?
People seem intent on adding lots and lots of stuff to instcombine because
> it's easy.
> This makes it harder to ever replace, while simultaneously making it
> slower.
> It's not hard to see where that path ends up.
> InstCombine does a lot of random crap that isn't even combining or graph
> rewrite, too, because well second and third order effects are hard.
>
Can you provide some examples? I honestly don't know where I should draw
the line. If I've crossed the line, I'd like to fix it.
If we want an optimal graph rewriter, we should actually just build one,
> with sane rewrite rules, verification that it fixpoints, and some sense of
> good ordering, etc, instead of slowly every adding possible pattern match
> to InstCombine.
>
Can you recommend papers or projects to look at to get a better
understanding of an optimal graph rewriter? LLVM is the only compiler I've
worked on from the inside, so I don't really know what else is possible /
feasible.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170519/b8d97a5e/attachment.html>
More information about the llvm-commits
mailing list