[llvm-bugs] [Bug 28070] New: IndVarSimplify + InstCombine integer widening does not play nice with loop vectorizer

Thu Jun 9 18:35:05 PDT 2016

https://llvm.org/bugs/show_bug.cgi?id=28070

            Bug ID: 28070
           Summary: IndVarSimplify + InstCombine integer widening does not
                    play nice with loop vectorizer
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P
         Component: Scalar Optimizations
          Assignee: unassignedbugs at nondot.org
          Reporter: mkuper at google.com
                CC: davidxl at google.com, llvm-bugs at lists.llvm.org,
                    wmi at google.com
    Classification: Unclassified

Consider a reduction loop, which reduces a multiplication of i32s into an i64:

long long foo() {
long long x = 42;
#pragma nounroll
#pragma clang loop interleave_count(1)
  for (int i = 0; i < 1000; i++) {
    x += i * i;
  }
  return x;
}

For:
$ clang -c -S -o - -O3 -mavx2 --target=x86_64

We'd like to get:
.LBB0_1:
    vpmulld    %xmm2, %xmm2, %xmm3
    vpaddd    %xmm1, %xmm2, %xmm2
    vpmovzxdq    %xmm3, %ymm3
    vpaddq    %ymm0, %ymm3, %ymm0
    addq    $-4, %rax
    jne    .LBB0_1

What we actually get is:
.LBB0_1:
    vpsrlq    $32, %ymm3, %ymm4
    vpmuludq    %ymm4, %ymm3, %ymm4
    vpmuludq    %ymm3, %ymm3, %ymm5
    vpaddq    %ymm1, %ymm3, %ymm3
    vpsllq    $32, %ymm4, %ymm4
    vpaddq    %ymm4, %ymm5, %ymm5
    vpaddq    %ymm4, %ymm5, %ymm4
    vpblendd    $170, %ymm2, %ymm4, %ymm4
    vpaddq    %ymm0, %ymm4, %ymm0
    addq    $-4, %rax
    jne    .LBB0_1

What happens is that IndVarSimplify promotes the induction variable from i32 to
i64:
for.body:
  %i.09 = phi i32 [ 0, %entry ], [ %inc, %for.body ]                   <==
  %x.08 = phi i64 [ 42, %entry ], [ %add, %for.body ]
  %mul = mul nsw i32 %i.09, %i.09 
  %conv7 = zext i32 %mul to i64
  %add = add nsw i64 %conv7, %x.08
  %inc = add nsw i32 %i.09, 1                                          <==
  %cmp = icmp slt i32 %inc, 1000
  br i1 %cmp, label %for.body, label %for.cond.cleanup, !llvm.loop !1
}

becomes

for.body:                                         ; preds = %entry, %for.body
  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ] <==
  %x.08 = phi i64 [ 42, %entry ], [ %add, %for.body ]
  %0 = trunc i64 %indvars.iv to i32                                    <== 
  %mul = mul nsw i32 %0, %0
  %conv7 = zext i32 %mul to i64
  %add = add nsw i64 %conv7, %x.08
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1                    <==
  %exitcond = icmp ne i64 %indvars.iv.next, 1000
  br i1 %exitcond, label %for.body, label %for.cond.cleanup, !llvm.loop !1

And then InstCombine notices the trunc -> mul -> zext, and promotes the whole
thing to i64:

for.body:                                         ; preds = %for.body, %entry
  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
  %x.08 = phi i64 [ 42, %entry ], [ %add, %for.body ]
  %mul = mul i64 %indvars.iv, %indvars.iv                              <==
  %conv7 = and i64 %mul, 4294967295                                    <==
  %add = add nsw i64 %conv7, %x.08
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %exitcond = icmp eq i64 %indvars.iv.next, 1000
  br i1 %exitcond, label %for.cond.cleanup, label %for.body, !llvm.loop !1

This is unfortunate, because we end up with illegal vector muls, and really
messy codegen. Note that the issue is not AVX2-specific, that's just the
cleanest example. We can get similar nonsense with other feature-sets.

Undoing this in codegen (by matching the "mul + and" back into a "trunc + mul +
zext") doesn't seem sufficient, since ideally we'd also like the vectorizer to
know what the real width is going to be. What we really want is to vectorize
this reduction by a factor of 8, like GCC does, and not by 4, and that would
require the cost model to know that we're reducing i32 values into an i64
result.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20160610/cb776d41/attachment-0001.html>