[PATCH] D67356: [InstCombine] Simplify @llvm.usub.with.overflow+non-zero check (PR43251)

Tue Sep 10 12:55:43 PDT 2019

lebedev.ri added inline comments.

================
Comment at: llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp:2249
+  if (Value *X = foldUnsignedUnderflowCheck(RHS, LHS, /*IsAnd=*/false, Builder))
+    return X;
+
----------------
xbolva00 wrote:
> vsk wrote:
> > lebedev.ri wrote:
> > > vsk wrote:
> > > > xbolva00 wrote:
> > > > > lebedev.ri wrote:
> > > > > > vsk wrote:
> > > > > > > lebedev.ri wrote:
> > > > > > > > vsk wrote:
> > > > > > > > > Does the 'or of icmps' case arise anywhere?
> > > > > > > > Define arise. We can trivially get one from another via De Morgan laws:
> > > > > > > > `(a && b) ? c : d` <--> `!(a && b) ? d : c` <--> `(!a || !b) ? d : c`,
> > > > > > > > It's *really* bad idea to intentionally not handle such nearby patterns.
> > > > > > > > 
> > > > > > > As in, do unsigned underflow checks tend to contain this specific or-of-icmp pattern, or more generally whether programs in the wild tend to.
> > > > > > > 
> > > > > > > If this isn't the case, then it seems like there's a compile-time cost for optimizing the pattern without any compensating benefit.
> > > > > > I'm not sure what the question is.
> > > > > If a compile time cost is concern, put it to AgressiveInstCombine - but I dont think we should do it in this case since there is nothing expensive like ValueTracking here..
> > > > > 
> > > > > I like Roman’s current code as is.
> > > > > 
> > > > I'm asking: why bother optimizing this pattern, or what is the positive case for doing so (apart from "it's possible to do so")? Essentially my previous comment, with a question mark at the end :).
> > > I'm still unable to answer the question because i'm not sure about which pattern specifically you are talking about.
> > > We can't just hope that the pattern we will always get is `(a && b) ? c : d `, it can trivially be converted to the `or`-form.
> > Stepping back a bit, I think the approach here should be driven by data.
> > 
> > You're right to be concerned about a change in the pipeline causing a performance regression ubsan. I don't think the answer to that is "introduce folds for all kinds of patterns the compiler *might* see", because then we'd endlessly be spinning our wheels and writing folds that aren't necessarily helpful (just extra maintenance & compile-time burden for no benefit).
> > 
> > Instead, the approach should be to focus on the actual end goal. We can check in a simple -fsanitize=pointer-overflow benchmark, perhaps your rawspeed bencharmk, to the llvm test suite. There are tons of bots that monitor the performance changes in the test suite (close to) every commit. Then, it makes sense to land targeted changes to improve performance. If there's ever a regression, we will actually know for sure, and be able to narrow it down, introduce a targeted fix, etc.
> > 
> > Taking a narrower focus again, I thought it was clear that we were discussing the 'or' form, but let me rephrase. Are the changes in "foldOrOfICmps" necessary to improve performance on your benchmark?
> >> just extra maintenance
> 
> Well, we could leave it as is, but after some weeks/months/years we would realize we need this fold and somebody may implement it from the scratch instead of current few lines.
> 
> >> compile-time burden
> 
> Well, I quite suprised that this concern is repeated so often for instcombine's fold but major new things introduced to other passes just go in.
> Are the changes in "foldOrOfICmps" necessary to improve performance on your benchmark?

Ok, thank you for your question.
I'm going to answer honestly: i don't know, i didn't check, and more importantly i don't care.

By short-shortsightedly handling only the most obvious pattern
one guarantees that the work **will** have to be redone to
* spot underoptimized pattern
* recognize that it can be folded
* come up with proof
* write tests
* find where to patch
* patch
* submit patch & get it through review.

That all is time-wasteful given it could have been
trivially solved when the initial pattern was being added.
Granted, not *all* variations will be caught,
but most obvious ones will be.

Even in my limited time contributing to instcombine pass
i have seen this play out, and it isn't fun to find
that it is due to not considering even basic commutativity
(and the question at hand,
(a && b) ? c : d <--> !(a && b) ? d : c <--> (!a || !b) ? d : c
**is** basic commutativity)

And finally, 'just as a glimpse of things to come',
that question will not exist (in it's current form at least)
when one day the peep-hole pass is SMT driven.
(well, more specifically all the folds would be pre-auto-deduced)

Let me know if that answers the question?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67356/new/

https://reviews.llvm.org/D67356