[PATCH] D47113: [CVP] Teach CorrelatedValuePropagation to reduce the width of lshr instruction.
Roman Lebedev via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sat May 26 04:39:23 PDT 2018
lebedev.ri added a comment.
In https://reviews.llvm.org/D47113#1111060, @dmgreen wrote:
> So yes, I ran some quick benchmarks
Thanks!
In https://reviews.llvm.org/D47113#1111060, @dmgreen wrote:
> and I believe this will cause regressions in some circumstances. In one case I looked at (which is running under our special LTO pipeline and may be a little difficult to replicate), we start off with this:
>
> %shr = lshr i32 %sub, 6
> %arrayidx = getelementptr inbounds i16, i16* %AllocationMap, i32 %shr
>
>
> This is turned into:
>
> %shr.lhs.trunc = trunc i32 %sub to i16
> %shr.rhs.trunc = trunc i32 6 to i16
> %shr = lshr i16 %shr.lhs.trunc, %shr.rhs.trunc
In https://reviews.llvm.org/D47113#1111060, @dmgreen wrote:
> %shr.zext = zext i16 %shr to i32
> %arrayidx = getelementptr inbounds i16, i16* %AllocationMap, i32 %shr.zext
Hmm, i think here we could avoid `zext`.
In https://reviews.llvm.org/D47113#1111060, @dmgreen wrote:
> Which gets turned right back into:
>
>
> %shr = lshr i32 %sub, 6
> %shr.zext = and i32 %shr, 1023
> %arrayidx11 = getelementptr inbounds i16, i16* %AllocationMap, i32 %shr.zext
>
> I think extra And node will, under most circumstances, be removed during isel. But here this is part of a loop, and the extra cost causes us to go over the loop unroll threshold, so the loop is no longer fully unrolled.
>
> Another case on v6m (thumb1only) looks more like a simple extra instruction in the final assembly. In either case the extra And 1023 seems to only be causing trouble.
Thank you!
So to not much surprise, @spatel was right in https://reviews.llvm.org/D47113#1106601
This all makes me think we should look in the third direction:
In https://reviews.llvm.org/D46760#1105919, @spatel wrote:
> You want to narrow multi-use sequences of code because instcombine can't do that profitably using minimal peepholes:
> ...
> The original motivation for -aggressive-instcombine (TruncInstCombine) was something almost like that - see https://reviews.llvm.org/D38313. Can you extend that? Note that the general problem isn't about udiv/urem, lshr, or any particular binop. It's about narrowing a sequence of arbitrary binops (and maybe even more than binops).
In https://reviews.llvm.org/D47113#1111060, @dmgreen wrote:
> I'm running some more benchmarks and will see what happens on other cores/benchmarks.
Once done, could you please post something to the thread, so it too would contain the knowledge?
Repository:
rL LLVM
https://reviews.llvm.org/D47113
More information about the llvm-commits
mailing list