[PATCH] D15250: Detecte vector reduction operations just before instruction selection.

Tue Jan 26 15:18:47 PST 2016

hfinkel accepted this revision.
hfinkel added a comment.
This revision is now accepted and ready to land.

Please make sure the select case if handled (autovectorization test case provided below); otherwise, LGTM. Thanks!

================
Comment at: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:2367
@@ +2366,3 @@
+      if (Inst->getOpcode() == OpCode || isa<PHINode>(U)) {
+        if (const FPMathOperator *FPOp = dyn_cast<const FPMathOperator>(Inst))
+          if (!FPOp->getFastMathFlags().unsafeAlgebra())
----------------
I understand, but this comes up in autovectorized code as well. Here's a quick example:

  $ cat /tmp/v.c 
  int foo(int * restrict a1, int * restrict a2, int * restrict a3, int * restrict a4, int * restrict a5,
          int * restrict a6, int * restrict a7, int * restrict a8, int * restrict a9, int * restrict a10,
          int * restrict a11, int * restrict a12, int * restrict a13, int * restrict a14, int * restrict a15,
          int * restrict a16, int * restrict a17, int * restrict a18, int * restrict a19, int * restrict a20,
          int * restrict a21, int * restrict a22, int * restrict a23, int * restrict a24, int * restrict a25,
          int * restrict a26, int * restrict a27, int * restrict a28, int * restrict a29, int * restrict a30,
          int * restrict b, int * restrict c, int x) {
    int r = 0;
    for (int i = 0; i < 1600; ++i)
      // Lots of other stuff to prevent loop unswitching from kicking in.
      r += a1[i] + a2[i] + a3[i] + a4[i] + a5[i] +
           a6[i] + a7[i] + a8[i] + a9[i] + a10[i] +
           a11[i] + a12[i] + a13[i] + a14[i] + a15[i] +
           a16[i] + a17[i] + a18[i] + a19[i] + a20[i] +
           a21[i] + a22[i] + a23[i] + a24[i] + a25[i] +
           a26[i] + a27[i] + a28[i] + a29[i] + a30[i] +
           b[i] + c[i] + (x > 5 ? b[i] : c[i]);

    return r;
  }

Look at the IR from:

  $ clang -target powerpc64 -mcpu=pwr7 -O3 -S -emit-llvm -fno-unroll-loops -o - /tmp/v.c 

and you'll see:

    %64 = select i1 %cmp93, <4 x i32> %wide.load170, <4 x i32> %wide.load171
  ...
    %93 = add <4 x i32> %92, %wide.load168
  %94 = add <4 x i32> %93, %wide.load169
  %95 = add <4 x i32> %94, %wide.load170
  %96 = add <4 x i32> %95, %wide.load171
  %97 = add <4 x i32> %96, %64
  ...

And we really should handle this case.

http://reviews.llvm.org/D15250