[PATCH] D64512: [InstCombine] Dropping redundant masking before left-shift [0/5] (PR42563)

Tue Jul 16 09:22:28 PDT 2019

spatel added inline comments.

================
Comment at: llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp:103-104
+  unsigned BitWidth = X->getType()->getScalarSizeInBits();
+  if (!match(SumOfShAmts, m_SpecificInt_ICMP(ICmpInst::Predicate::ICMP_UGE,
+                                             APInt(BitWidth, BitWidth))))
+    return nullptr;
----------------
lebedev.ri wrote:
> spatel wrote:
> > lebedev.ri wrote:
> > > It may be possible to use `SimplifyICmpInst()` here, to catch more cases, may look into that after patchset.
> > Using instsimplify like that (or as currently in this patch with SimplifyBinOp) is where we go into the InstCombine gray area of, "Is it worth the cost?"
> > 
> > With InstSimplify, we're using a potentially expensive/recursive analysis (similar to ValueTracking), and we know that's where InstCombine spends a lot of time. 
> > 
> > cc'ing @efriedma for further thoughts.
> I'm only seeing two big alternatives here:
> a) not doing these kind of optimizations at all. i sure hope this isn't the answer :)
> b) patternmatch proliferation - basically just reimplement every interesting fold that we'd get via `SimplifyBinOp` in here - code bloat and wasted dev time, i'd rather not.
> 
> These patterns can come up within bit stream manipulation codepaths (think like `llvm/lib/Bitcode`),
> and they may end up being the hotpath, so it's good to not have them there.
> I'm not sure i have any numbers at hand, but as a ballpark, in the end with all these shift optimizations
> (including those that i didn't do yet), i'm hoping to reclaim* -5%..-25% of runtime.
> 
> As a middle-ground, can the recursion depth be specified when calling `SimplifyBinOp()`?
> I'm not seeing any cleverer fix here.
> 
> *reclaim - when changing [[ https://github.com/darktable-org/rawspeed/blob/668e5db9f6b9b0499691265fa2f1ce16d4853937/src/librawspeed/io/BitStream.h#L81-L83 | this ]] buffer abstraction from 
> ```
> return (cache >> (fillLevel - count)) & ((1U << count) - 1U);
> ```
> to
> ```
> return (cache >> (32 - count));
> ```
> that is the perf regression i'm getting; and looking at the crude IR in regressed benchmarks,
> i //think// //some// of that loss i will reclaim.
I agree - I'd like to have the optimization. I'm only acknowledging our long-standing problem with instcombine.

There's 1 more possibility: move this to AggressiveInstCombine (which is currently only enabled at -O3...).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64512/new/

https://reviews.llvm.org/D64512