[PATCH] D64512: [InstCombine] Dropping redundant masking before left-shift [0/5] (PR42563)

Roman Lebedev via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Jul 16 08:42:48 PDT 2019


lebedev.ri added inline comments.


================
Comment at: llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp:103-104
+  unsigned BitWidth = X->getType()->getScalarSizeInBits();
+  if (!match(SumOfShAmts, m_SpecificInt_ICMP(ICmpInst::Predicate::ICMP_UGE,
+                                             APInt(BitWidth, BitWidth))))
+    return nullptr;
----------------
spatel wrote:
> lebedev.ri wrote:
> > It may be possible to use `SimplifyICmpInst()` here, to catch more cases, may look into that after patchset.
> Using instsimplify like that (or as currently in this patch with SimplifyBinOp) is where we go into the InstCombine gray area of, "Is it worth the cost?"
> 
> With InstSimplify, we're using a potentially expensive/recursive analysis (similar to ValueTracking), and we know that's where InstCombine spends a lot of time. 
> 
> cc'ing @efriedma for further thoughts.
I'm only seeing two big alternatives here:
a) not doing these kind of optimizations at all. i sure hope this isn't the answer :)
b) patternmatch proliferation - basically just reimplement every interesting fold that we'd get via `SimplifyBinOp` in here - code bloat and wasted dev time, i'd rather not.

These patterns can come up within bit stream manipulation codepaths (think like `llvm/lib/Bitcode`),
and they may end up being the hotpath, so it's good to not have them there.
I'm not sure i have any numbers at hand, but as a ballpark, in the end with all these shift optimizations
(including those that i didn't do yet), i'm hoping to reclaim* -5%..-25% of runtime.

As a middle-ground, can the recursion depth be specified when calling `SimplifyBinOp()`?
I'm not seeing any cleverer fix here.

*reclaim - when changing [[ https://github.com/darktable-org/rawspeed/blob/668e5db9f6b9b0499691265fa2f1ce16d4853937/src/librawspeed/io/BitStream.h#L81-L83 | this ]] buffer abstraction from 
```
return (cache >> (fillLevel - count)) & ((1U << count) - 1U);
```
to
```
return (cache >> (32 - count));
```
that is the perf regression i'm getting; and looking at the crude IR in regressed benchmarks,
i //think// //some// of that loss i will reclaim.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64512/new/

https://reviews.llvm.org/D64512





More information about the llvm-commits mailing list