[PATCH] D15250: Detecte vector reduction operations just before instruction selection.
Hal Finkel via llvm-commits
llvm-commits at lists.llvm.org
Tue Jan 26 15:18:47 PST 2016
hfinkel accepted this revision.
hfinkel added a comment.
This revision is now accepted and ready to land.
Please make sure the select case if handled (autovectorization test case provided below); otherwise, LGTM. Thanks!
================
Comment at: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:2367
@@ +2366,3 @@
+ if (Inst->getOpcode() == OpCode || isa<PHINode>(U)) {
+ if (const FPMathOperator *FPOp = dyn_cast<const FPMathOperator>(Inst))
+ if (!FPOp->getFastMathFlags().unsafeAlgebra())
----------------
I understand, but this comes up in autovectorized code as well. Here's a quick example:
$ cat /tmp/v.c
int foo(int * restrict a1, int * restrict a2, int * restrict a3, int * restrict a4, int * restrict a5,
int * restrict a6, int * restrict a7, int * restrict a8, int * restrict a9, int * restrict a10,
int * restrict a11, int * restrict a12, int * restrict a13, int * restrict a14, int * restrict a15,
int * restrict a16, int * restrict a17, int * restrict a18, int * restrict a19, int * restrict a20,
int * restrict a21, int * restrict a22, int * restrict a23, int * restrict a24, int * restrict a25,
int * restrict a26, int * restrict a27, int * restrict a28, int * restrict a29, int * restrict a30,
int * restrict b, int * restrict c, int x) {
int r = 0;
for (int i = 0; i < 1600; ++i)
// Lots of other stuff to prevent loop unswitching from kicking in.
r += a1[i] + a2[i] + a3[i] + a4[i] + a5[i] +
a6[i] + a7[i] + a8[i] + a9[i] + a10[i] +
a11[i] + a12[i] + a13[i] + a14[i] + a15[i] +
a16[i] + a17[i] + a18[i] + a19[i] + a20[i] +
a21[i] + a22[i] + a23[i] + a24[i] + a25[i] +
a26[i] + a27[i] + a28[i] + a29[i] + a30[i] +
b[i] + c[i] + (x > 5 ? b[i] : c[i]);
return r;
}
Look at the IR from:
$ clang -target powerpc64 -mcpu=pwr7 -O3 -S -emit-llvm -fno-unroll-loops -o - /tmp/v.c
and you'll see:
%64 = select i1 %cmp93, <4 x i32> %wide.load170, <4 x i32> %wide.load171
...
%93 = add <4 x i32> %92, %wide.load168
%94 = add <4 x i32> %93, %wide.load169
%95 = add <4 x i32> %94, %wide.load170
%96 = add <4 x i32> %95, %wide.load171
%97 = add <4 x i32> %96, %64
...
And we really should handle this case.
http://reviews.llvm.org/D15250
More information about the llvm-commits
mailing list