[PATCH] D67356: [InstCombine] Simplify @llvm.usub.with.overflow+non-zero check (PR43251)

Tue Sep 10 12:12:00 PDT 2019

vsk added inline comments.

================
Comment at: llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp:2249
+  if (Value *X = foldUnsignedUnderflowCheck(RHS, LHS, /*IsAnd=*/false, Builder))
+    return X;
+
----------------
lebedev.ri wrote:
> vsk wrote:
> > xbolva00 wrote:
> > > lebedev.ri wrote:
> > > > vsk wrote:
> > > > > lebedev.ri wrote:
> > > > > > vsk wrote:
> > > > > > > Does the 'or of icmps' case arise anywhere?
> > > > > > Define arise. We can trivially get one from another via De Morgan laws:
> > > > > > `(a && b) ? c : d` <--> `!(a && b) ? d : c` <--> `(!a || !b) ? d : c`,
> > > > > > It's *really* bad idea to intentionally not handle such nearby patterns.
> > > > > > 
> > > > > As in, do unsigned underflow checks tend to contain this specific or-of-icmp pattern, or more generally whether programs in the wild tend to.
> > > > > 
> > > > > If this isn't the case, then it seems like there's a compile-time cost for optimizing the pattern without any compensating benefit.
> > > > I'm not sure what the question is.
> > > If a compile time cost is concern, put it to AgressiveInstCombine - but I dont think we should do it in this case since there is nothing expensive like ValueTracking here..
> > > 
> > > I like Roman’s current code as is.
> > > 
> > I'm asking: why bother optimizing this pattern, or what is the positive case for doing so (apart from "it's possible to do so")? Essentially my previous comment, with a question mark at the end :).
> I'm still unable to answer the question because i'm not sure about which pattern specifically you are talking about.
> We can't just hope that the pattern we will always get is `(a && b) ? c : d `, it can trivially be converted to the `or`-form.
Stepping back a bit, I think the approach here should be driven by data.

You're right to be concerned about a change in the pipeline causing a performance regression ubsan. I don't think the answer to that is "introduce folds for all kinds of patterns the compiler *might* see", because then we'd endlessly be spinning our wheels and writing folds that aren't necessarily helpful (just extra maintenance & compile-time burden for no benefit).

Instead, the approach should be to focus on the actual end goal. We can check in a simple -fsanitize=pointer-overflow benchmark, perhaps your rawspeed bencharmk, to the llvm test suite. There are tons of bots that monitor the performance changes in the test suite (close to) every commit. Then, it makes sense to land targeted changes to improve performance. If there's ever a regression, we will actually know for sure, and be able to narrow it down, introduce a targeted fix, etc.

Taking a narrower focus again, I thought it was clear that we were discussing the 'or' form, but let me rephrase. Are the changes in "foldOrOfICmps" necessary to improve performance on your benchmark?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67356/new/

https://reviews.llvm.org/D67356