[PATCH] [SLPVectorization] Vectorize flat addition in a single tree (+(+(+ v1 v2) v3) v4)
suyog
suyog.sarda at samsung.com
Mon Jan 5 04:24:23 PST 2015
Hi James,
Thanks for the review.
Yes its a very bad code design and i will come up with better design for tracking flags.
I had this feeling while writing code itself. Thanks for pointing out.
For some of the issues, you raised, commenting inline.
Regards,
Suyog
REPOSITORY
rL LLVM
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3345
@@ +3344,3 @@
+
+ if (ReduxWidth < 4)
+ return false;
----------------
jmolloy wrote:
> Why?
Will it be beneficial if we had Reduction width less than 4, say suppose 2?
I had just copied this from matchAssociativeReduction, i feel the reason there would be the same.
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3348
@@ +3347,3 @@
+
+ if (ReductionOpcode != Instruction::Add)
+ return false;
----------------
jmolloy wrote:
> Why?
If we allow it for floating point data types, results may vary, since (a+b)+c != a+(b+c) in case of floating point data structure (Chandler pointed this in earlier patches as well). Since, by vectorizing, we are changing the addition order, it may affect floating point additions. Hence, only integer add. We can allow it for integer multiplication as well though.
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3598
@@ +3597,3 @@
+ // %6 = extractelement %5 <0>
+ if (IsHAdd) {
+ unsigned VecElem = VecTy->getVectorNumElements();
----------------
jmolloy wrote:
> As I've mentioned several times in different threads, I don't like this. Architectures such as AArch64 have dedicated reduction instructions (ADDV), and so their cost does not follow the IR pattern given above.
>
> The IR pattern above is matched to pairwise-adds by the X86 backend, so that cost isn't the same either.
The assembly generated as of now after vectorization, does not generate ADDV, which is bad.
But if we need to vectorize a horizontal addition, is there any other way it would be done on IR level?
Once, we achieve it at IR level, we can lower it to ADDV at DAG level in DAGCombine.
You had suggested earlier to have an IR intrinsic to indicate pattern and then lower that to machine specific instructions. Any other way than that?
http://reviews.llvm.org/D6818
EMAIL PREFERENCES
http://reviews.llvm.org/settings/panel/emailpreferences/
More information about the llvm-commits
mailing list