[PATCH] D15815: [SLP] Truncate expressions to minimum required bit width

Thu Jan 7 00:48:48 PST 2016

jmolloy added inline comments.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3260
@@ +3259,3 @@
+
+  // If there are no external uses, the expression tree must be rooted by a
+  // store. We can't demote in-memory values, so there is nothing to do here.
----------------
mssimpso wrote:
> This work is a little more limited than what you've done in the loop vectorizer. Because we're working with expression trees, I thought it would be simpler to only truncate the roots of single-use chains. If the tree has no external users, it should be rooted by stores, which obviously can't be truncated. So we give up in that case. I've tried to explain my reasoning a little better in the comments, but I haven't yet thought much about the more general case.
> 
> In particular, the current patch doesn't do anything useful for the single-use chains ending in stores that have unnecessary extends and truncations in them that prevent us from getting past the cost model. Even though the extends and truncations in those chains may be removed by InstCombine after vectorization, they will currently still be counted by the model. I was thinking of adding this functionality as a follow-on, but I can add it to this patch if you like. What do you think?
> In particular, the current patch doesn't do anything useful for the single-use chains ending in stores that have unnecessary extends and truncations

OK, that's what I was missing.

>  I was thinking of adding this functionality as a follow-on, but I can add it to this patch if you like. What do you think?

I'm happy with it as a followon. I just didn't understand. Thanks!

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3303
@@ +3302,3 @@
+      // Get a representative value for the vectorizable bundle. All values in
+      // Entry.Scalars should be isomorphic.
+      auto *Scalar = Entry.Scalars[0];
----------------
mssimpso wrote:
> My first inclination was to try and reuse the DemandedBits analysis you added to VectorUtils. However, demanded bits alone didn't help the case I was most interested in: the index computation of GEPs, which is promoted to 64 bits, all of them being demanded. Since all the bits are demanded by the GEP, we have to look at the number of leading sign bits instead. But I've updated the patch to also consider the demanded bits of the roots since they are all we truncate.
Ah I see. Yes, I can see that both approaches together would probably get you the best result.

================
Comment at: test/Transforms/SLPVectorizer/AArch64/gather-reduce.ll:1
@@ -1,2 +2,1 @@
-; RUN: opt -S -slp-vectorizer -dce -instcombine < %s | FileCheck %s
 
----------------
I'm not sure about this: Why are tests being removed? Unless we're fundamentally regressing performance, I think new tests should be added instead of existing tests modified to check new behaviour.


http://reviews.llvm.org/D15815