[PATCH] D47113: [CVP] Teach CorrelatedValuePropagation to reduce the width of lshr instruction.

Sat May 26 04:39:23 PDT 2018

lebedev.ri added a comment.

In https://reviews.llvm.org/D47113#1111060, @dmgreen wrote:

> So yes, I ran some quick benchmarks

Thanks!

In https://reviews.llvm.org/D47113#1111060, @dmgreen wrote:

> and I believe this will cause regressions in some circumstances. In one case I looked at (which is running under our special LTO pipeline and may be a little difficult to replicate), we start off with this:
>
>   %shr = lshr i32 %sub, 6
>   %arrayidx = getelementptr inbounds i16, i16* %AllocationMap, i32 %shr
>
>
> This is turned into:
>
>   %shr.lhs.trunc = trunc i32 %sub to i16
>   %shr.rhs.trunc = trunc i32 6 to i16
>   %shr = lshr i16 %shr.lhs.trunc, %shr.rhs.trunc

In https://reviews.llvm.org/D47113#1111060, @dmgreen wrote:

> %shr.zext = zext i16 %shr to i32
>  %arrayidx = getelementptr inbounds i16, i16* %AllocationMap, i32 %shr.zext

Hmm, i think here we could avoid `zext`.

In https://reviews.llvm.org/D47113#1111060, @dmgreen wrote:

>   Which gets turned right back into:
>
>
>   %shr = lshr i32 %sub, 6
>   %shr.zext = and i32 %shr, 1023
>   %arrayidx11 = getelementptr inbounds i16, i16* %AllocationMap, i32 %shr.zext
>
>   I think extra And node will, under most circumstances, be removed during isel. But here this is part of a loop, and the extra cost causes us to go over the loop unroll threshold, so the loop is no longer fully unrolled.
>  
>   Another case on v6m (thumb1only) looks more like a simple extra instruction in the final assembly. In either case the extra And 1023 seems to only be causing trouble.

Thank you!
So to not much surprise, @spatel was right in https://reviews.llvm.org/D47113#1106601

This all makes me think we should look in the third direction:

In https://reviews.llvm.org/D46760#1105919, @spatel wrote:

> You want to narrow multi-use sequences of code because instcombine can't do that profitably using minimal peepholes:
>  ...
>  The original motivation for -aggressive-instcombine (TruncInstCombine) was something almost like that - see https://reviews.llvm.org/D38313. Can you extend that? Note that the general problem isn't about udiv/urem, lshr, or any particular binop. It's about narrowing a sequence of arbitrary binops (and maybe even more than binops).

In https://reviews.llvm.org/D47113#1111060, @dmgreen wrote:

> I'm running some more benchmarks and will see what happens on other cores/benchmarks.

Once done, could you please post something to the thread, so it too would contain the knowledge?

Repository:
  rL LLVM

https://reviews.llvm.org/D47113