[PATCH] D64512: [InstCombine] Dropping redundant masking before left-shift [0/5] (PR42563)
Sanjay Patel via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Jul 16 09:22:28 PDT 2019
spatel added inline comments.
================
Comment at: llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp:103-104
+ unsigned BitWidth = X->getType()->getScalarSizeInBits();
+ if (!match(SumOfShAmts, m_SpecificInt_ICMP(ICmpInst::Predicate::ICMP_UGE,
+ APInt(BitWidth, BitWidth))))
+ return nullptr;
----------------
lebedev.ri wrote:
> spatel wrote:
> > lebedev.ri wrote:
> > > It may be possible to use `SimplifyICmpInst()` here, to catch more cases, may look into that after patchset.
> > Using instsimplify like that (or as currently in this patch with SimplifyBinOp) is where we go into the InstCombine gray area of, "Is it worth the cost?"
> >
> > With InstSimplify, we're using a potentially expensive/recursive analysis (similar to ValueTracking), and we know that's where InstCombine spends a lot of time.
> >
> > cc'ing @efriedma for further thoughts.
> I'm only seeing two big alternatives here:
> a) not doing these kind of optimizations at all. i sure hope this isn't the answer :)
> b) patternmatch proliferation - basically just reimplement every interesting fold that we'd get via `SimplifyBinOp` in here - code bloat and wasted dev time, i'd rather not.
>
> These patterns can come up within bit stream manipulation codepaths (think like `llvm/lib/Bitcode`),
> and they may end up being the hotpath, so it's good to not have them there.
> I'm not sure i have any numbers at hand, but as a ballpark, in the end with all these shift optimizations
> (including those that i didn't do yet), i'm hoping to reclaim* -5%..-25% of runtime.
>
> As a middle-ground, can the recursion depth be specified when calling `SimplifyBinOp()`?
> I'm not seeing any cleverer fix here.
>
> *reclaim - when changing [[ https://github.com/darktable-org/rawspeed/blob/668e5db9f6b9b0499691265fa2f1ce16d4853937/src/librawspeed/io/BitStream.h#L81-L83 | this ]] buffer abstraction from
> ```
> return (cache >> (fillLevel - count)) & ((1U << count) - 1U);
> ```
> to
> ```
> return (cache >> (32 - count));
> ```
> that is the perf regression i'm getting; and looking at the crude IR in regressed benchmarks,
> i //think// //some// of that loss i will reclaim.
I agree - I'd like to have the optimization. I'm only acknowledging our long-standing problem with instcombine.
There's 1 more possibility: move this to AggressiveInstCombine (which is currently only enabled at -O3...).
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D64512/new/
https://reviews.llvm.org/D64512
More information about the llvm-commits
mailing list