[PATCH] D67356: [InstCombine] Simplify @llvm.usub.with.overflow+non-zero check (PR43251)

Tue Sep 10 14:49:34 PDT 2019

lebedev.ri marked 9 inline comments as done.
lebedev.ri added inline comments.

================
Comment at: llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp:2249
+  if (Value *X = foldUnsignedUnderflowCheck(RHS, LHS, /*IsAnd=*/false, Builder))
+    return X;
+
----------------
vsk wrote:
> vsk wrote:
> > lebedev.ri wrote:
> > > xbolva00 wrote:
> > > > lebedev.ri wrote:
> > > > > xbolva00 wrote:
> > > > > > vsk wrote:
> > > > > > > lebedev.ri wrote:
> > > > > > > > vsk wrote:
> > > > > > > > > xbolva00 wrote:
> > > > > > > > > > lebedev.ri wrote:
> > > > > > > > > > > vsk wrote:
> > > > > > > > > > > > lebedev.ri wrote:
> > > > > > > > > > > > > vsk wrote:
> > > > > > > > > > > > > > Does the 'or of icmps' case arise anywhere?
> > > > > > > > > > > > > Define arise. We can trivially get one from another via De Morgan laws:
> > > > > > > > > > > > > `(a && b) ? c : d` <--> `!(a && b) ? d : c` <--> `(!a || !b) ? d : c`,
> > > > > > > > > > > > > It's *really* bad idea to intentionally not handle such nearby patterns.
> > > > > > > > > > > > > 
> > > > > > > > > > > > As in, do unsigned underflow checks tend to contain this specific or-of-icmp pattern, or more generally whether programs in the wild tend to.
> > > > > > > > > > > > 
> > > > > > > > > > > > If this isn't the case, then it seems like there's a compile-time cost for optimizing the pattern without any compensating benefit.
> > > > > > > > > > > I'm not sure what the question is.
> > > > > > > > > > If a compile time cost is concern, put it to AgressiveInstCombine - but I dont think we should do it in this case since there is nothing expensive like ValueTracking here..
> > > > > > > > > > 
> > > > > > > > > > I like Roman’s current code as is.
> > > > > > > > > > 
> > > > > > > > > I'm asking: why bother optimizing this pattern, or what is the positive case for doing so (apart from "it's possible to do so")? Essentially my previous comment, with a question mark at the end :).
> > > > > > > > I'm still unable to answer the question because i'm not sure about which pattern specifically you are talking about.
> > > > > > > > We can't just hope that the pattern we will always get is `(a && b) ? c : d `, it can trivially be converted to the `or`-form.
> > > > > > > Stepping back a bit, I think the approach here should be driven by data.
> > > > > > > 
> > > > > > > You're right to be concerned about a change in the pipeline causing a performance regression ubsan. I don't think the answer to that is "introduce folds for all kinds of patterns the compiler *might* see", because then we'd endlessly be spinning our wheels and writing folds that aren't necessarily helpful (just extra maintenance & compile-time burden for no benefit).
> > > > > > > 
> > > > > > > Instead, the approach should be to focus on the actual end goal. We can check in a simple -fsanitize=pointer-overflow benchmark, perhaps your rawspeed bencharmk, to the llvm test suite. There are tons of bots that monitor the performance changes in the test suite (close to) every commit. Then, it makes sense to land targeted changes to improve performance. If there's ever a regression, we will actually know for sure, and be able to narrow it down, introduce a targeted fix, etc.
> > > > > > > 
> > > > > > > Taking a narrower focus again, I thought it was clear that we were discussing the 'or' form, but let me rephrase. Are the changes in "foldOrOfICmps" necessary to improve performance on your benchmark?
> > > > > > >> just extra maintenance
> > > > > > 
> > > > > > Well, we could leave it as is, but after some weeks/months/years we would realize we need this fold and somebody may implement it from the scratch instead of current few lines.
> > > > > > 
> > > > > > >> compile-time burden
> > > > > > 
> > > > > > Well, I quite suprised that this concern is repeated so often for instcombine's fold but major new things introduced to other passes just go in.
> > > > > > Are the changes in "foldOrOfICmps" necessary to improve performance on your benchmark?
> > > > > 
> > > > > Ok, thank you for your question.
> > > > > I'm going to answer honestly: i don't know, i didn't check, and more importantly i don't care.
> > > > > 
> > > > > By short-shortsightedly handling only the most obvious pattern
> > > > > one guarantees that the work **will** have to be redone to
> > > > > * spot underoptimized pattern
> > > > > * recognize that it can be folded
> > > > > * come up with proof
> > > > > * write tests
> > > > > * find where to patch
> > > > > * patch
> > > > > * submit patch & get it through review.
> > > > > 
> > > > > That all is time-wasteful given it could have been
> > > > > trivially solved when the initial pattern was being added.
> > > > > Granted, not *all* variations will be caught,
> > > > > but most obvious ones will be.
> > > > > 
> > > > > Even in my limited time contributing to instcombine pass
> > > > > i have seen this play out, and it isn't fun to find
> > > > > that it is due to not considering even basic commutativity
> > > > > (and the question at hand,
> > > > > (a && b) ? c : d <--> !(a && b) ? d : c <--> (!a || !b) ? d : c
> > > > > **is** basic commutativity)
> > > > > 
> > > > > And finally, 'just as a glimpse of things to come',
> > > > > that question will not exist (in it's current form at least)
> > > > > when one day the peep-hole pass is SMT driven.
> > > > > (well, more specifically all the folds would be pre-auto-deduced)
> > > > > 
> > > > > Let me know if that answers the question?
> > > > >> when one day the peep-hole pass is SMT driven.
> > > > 
> > > > compile time would be crazy  :D
> > > >> when one day the peep-hole pass is SMT driven.
> > > > compile time would be crazy :D
> > > 
> > > Yes, correct observation :D thus
> > > 
> > > >> (well, more specifically all the folds would be pre-auto-deduced)
> > > 
> > > so it wouldn't //actually// do any SMT during opt..
> > To your first concern, llvm contributors are generally pretty good at spotting opportunities to reuse code, so I'm not too worried. Besides, @lebedev.ri's already done the work and left a paper trail :).
> > 
> > Second, note that InstCombine really is one of the biggest (top 3, IIRC) compile-time bears in the mid-level pipeline (http://lists.llvm.org/pipermail/llvm-dev/2017-March/111257.html). The marginal cost of introducing a new fold to InstCombine may be low, but adding folds without regard for the time-benefit tradeoff is what gets us into trouble.
> So, "i didn't check, and more importantly i don't care" is not a great approach here :). And it's not at all clear to me that work will have to be redone: perhaps only a subset of these changes will ever be needed! Imho we should let performance data and real-world examples guide us, not intuitions and hunches.
> 
> Perhaps I'm not making the case for this well. Paging @majnemer -- as the code owner, wdyt?
I have nothing further to add here.
To me it's exactly the same question as "shall we only match `(X * Y) ult Z` or shall we also match `Z ugt (X * Y)`?"

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67356/new/

https://reviews.llvm.org/D67356