[PATCH] D15250: Detecte vector reduction operations just before instruction selection.
hfinkel@anl.gov via llvm-commits
llvm-commits at lists.llvm.org
Thu Dec 10 11:35:41 PST 2015
hfinkel added inline comments.
================
Comment at: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:2333
@@ +2332,3 @@
+ case Instruction::Xor:
+ break;
+ default:
----------------
We need to also grab FAdd and FMul when appropriate fast-math flags are present.
================
Comment at: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:2377
@@ +2376,3 @@
+ if (Inst->getOpcode() == OpCode || isa<PHINode>(U)) {
+ UsersToVisit.push_back(U);
+ } else if (const ShuffleVectorInst *ShufInst =
----------------
For PHIs, don't we need to check that the initial value is the operation identify value (0 for add, 1 for multiply, etc.)? And that there are only two unique incoming blocks?
================
Comment at: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:2410
@@ +2409,3 @@
+ return false;
+ } else if (isa<ExtractElementInst>(U)) {
+ // At this moment we should have reduced all elements in the vector.
----------------
We should also catch the way that this is often programmed "by hand":
typedef float v4f __attribute__((vector_size(16)));
v4f foo(void);
float bar() {
v4f phi = { 0, 0, 0, 0 };
for (int i = 0; i < 1600; ++i)
phi += foo();
return phi[0] + phi[1] + phi[2] + phi[3];
}
thus, we have multiple extracts instead of shuffles. One might argue that we should canonicalize this to the shuffle form, but we don't have anything that does this currently (AFAIK). What do you think?
================
Comment at: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:2457
@@ -2341,2 +2456,3 @@
Flags.setNoUnsignedWrap(nuw);
+ Flags.setVectorReduction(vec_redux);
if (EnableFMFInDAG) {
----------------
You need to set this flag on the PHI node too (meaning the associate CopyFromReg, etc. instructions), right?
================
Comment at: test/CodeGen/Generic/vector-redux.ll:2
@@ +1,3 @@
+; RUN: llc < %s -debug-only=isel -o /dev/null 2>&1 | FileCheck %s
+
+ at a = global [1024 x i32] zeroinitializer, align 16
----------------
You need to add:
; REQUIRES: asserts
here because you're checking debug output which is available only in +Asserts builds.
http://reviews.llvm.org/D15250
More information about the llvm-commits
mailing list