[PATCH] [SLPVectorization] Vectorize flat addition in a single tree (+(+(+ v1 v2) v3) v4)

Mon Jan 5 04:24:23 PST 2015

Hi James,

Thanks for the review.

Yes its a very bad code design and i will come up with better design for tracking flags.
I had this feeling while writing code itself. Thanks for pointing out.

For some of the issues, you raised, commenting inline.

Regards,
Suyog

REPOSITORY
  rL LLVM

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3345
@@ +3344,3 @@
+
+    if (ReduxWidth < 4)
+      return false;
----------------
jmolloy wrote:
> Why?
Will it be beneficial if we had Reduction width less than 4, say suppose 2? 

I had just copied this from matchAssociativeReduction, i feel the reason there would be the same.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3348
@@ +3347,3 @@
+
+    if (ReductionOpcode != Instruction::Add)
+      return false;
----------------
jmolloy wrote:
> Why?
If we allow it for floating point data types, results may vary, since (a+b)+c != a+(b+c) in case of floating point data structure (Chandler pointed this in earlier patches as well). Since, by vectorizing, we are changing the addition order, it may affect floating point additions. Hence, only integer add. We can allow it for integer multiplication as well though.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3598
@@ +3597,3 @@
+    // %6 = extractelement %5 <0>
+    if (IsHAdd) {
+      unsigned VecElem = VecTy->getVectorNumElements();
----------------
jmolloy wrote:
> As I've mentioned several times in different threads, I don't like this. Architectures such as AArch64 have dedicated reduction instructions (ADDV), and so their cost does not follow the IR pattern given above.
> 
> The IR pattern above is matched to pairwise-adds by the X86 backend, so that cost isn't the same either.
The assembly generated as of now after vectorization, does not generate ADDV, which is bad.
But if we need to vectorize a horizontal addition, is there any other way it would be done on IR level?
Once, we achieve it at IR level, we can lower it to ADDV at DAG level in DAGCombine.

You had suggested earlier to have an IR intrinsic to indicate pattern and then lower that to machine specific instructions. Any other way than that?

http://reviews.llvm.org/D6818

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/