[PATCH] D47113: [CVP] Teach CorrelatedValuePropagation to reduce the width of lshr instruction.

Thu May 24 07:27:03 PDT 2018

dmgreen added a comment.

So yes, I ran some quick benchmarks and I believe this will cause regressions in some circumstances. In one case I looked at (which is running under our special LTO pipeline and may be a little difficult to replicate), we start off with this:

  %shr = lshr i32 %sub, 6
  %arrayidx = getelementptr inbounds i16, i16* %AllocationMap, i32 %shr

This is turned into:

  %shr.lhs.trunc = trunc i32 %sub to i16
  %shr.rhs.trunc = trunc i32 6 to i16
  %shr = lshr i16 %shr.lhs.trunc, %shr.rhs.trunc
  %shr.zext = zext i16 %shr to i32
  %arrayidx = getelementptr inbounds i16, i16* %AllocationMap, i32 %shr.zext

Which gets turned right back into:

  %shr = lshr i32 %sub, 6
  %shr.zext = and i32 %shr, 1023
  %arrayidx11 = getelementptr inbounds i16, i16* %AllocationMap, i32 %shr.zext

I think extra And node will, under most circumstances, be removed during isel. But here this is part of a loop, and the extra cost causes us to go over the loop unroll threshold, so the loop is no longer fully unrolled.

Another case on v6m (thumb1only) looks more like a simple extra instruction in the final assembly. In either case the extra And 1023 seems to only be causing trouble.

I'm running some more benchmarks and will see what happens on other cores/benchmarks.

Repository:
  rL LLVM

https://reviews.llvm.org/D47113