[PATCH] [SLPVectorization] Vectorize flat addition in a single tree (+(+(+ v1 v2) v3) v4)

Mon Jan 5 02:45:13 PST 2015

Hi,

Comments inline.


REPOSITORY
  rL LLVM

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:78
@@ -76,1 +77,3 @@
 
+static bool IsReturn = false;
+
----------------
Mutable globals? This is a really bad code smell.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1629
@@ +1628,3 @@
+  // Return true if a tree of size 1 feeds into a return and is horizontal Add.
+  if (VectorizableTree.size() == 1 && IsReturn && IsHAdd)
+    return true;
----------------
Why does it matter that it feeds a return? Why wouldn't feeding a store also trigger? or a call operand?

It looks like you're using mutable globals to track state, which is a really bad pattern. It'll mean that two SLPVectorizer passes can't be used in parallel, which will break some JIT compilers and people compiling multiple Modules in parallel.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3327
@@ -3317,1 +3326,3 @@
 
+  // Try to find a flat horizontal reduction. The tree structure of such
+  // addition of type a[0]+a[1]+a[2]+a[3].... will be be like
----------------
Please write comments in full sentences, without ellipses (...) where possible. Where ellipses are needed, they have 3 periods (...).

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3345
@@ +3344,3 @@
+
+    if (ReduxWidth < 4)
+      return false;
----------------
Why?

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3348
@@ +3347,3 @@
+
+    if (ReductionOpcode != Instruction::Add)
+      return false;
----------------
Why?

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3598
@@ +3597,3 @@
+    // %6 = extractelement %5 <0>
+    if (IsHAdd) {
+      unsigned VecElem = VecTy->getVectorNumElements();
----------------
As I've mentioned several times in different threads, I don't like this. Architectures such as AArch64 have dedicated reduction instructions (ADDV), and so their cost does not follow the IR pattern given above.

The IR pattern above is matched to pairwise-adds by the X86 backend, so that cost isn't the same either.

http://reviews.llvm.org/D6818

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/