[PATCH] D9804: Optimize scattered vector insert/extract pattern

Nadav Rotem via llvm-commits llvm-commits at lists.llvm.org
Fri Sep 25 11:30:58 PDT 2015


nadav added a comment.

In http://reviews.llvm.org/D9804#252514, @hfinkel wrote:

> At a high level, this transformation seems overly restrictive, and will need cost-modeling work. A couple of thoughts:
>
> 1. I don't see why you're restricting this to extracts used by stores (or inserts fed by loads); if the goal is to save on [zs]ext instructions, then this seems profitable regardless of how these are used. Moreover, I don't understand why there's a hasOneUse() check on the [zs]ext instructions.
> 2. The [zs]ext instructions that you're trying to eliminate might be free, at least in combination with the extract or insert, rendering this a bad idea. Consider the (unfortunately common) case where the target does not actually support a vector extract at all, and so it is lowered by storing the vector on the stack and then doing a scalar load of the requested element. In this case, if the target supports the corresponding scalar extending load, the extension is free. Likewise, for those [zs]ext fed by loads, these might be free if the target supports the corresponding extending load. Worse, the vector [zs]ext you're forming might not be legal at all (this is the most-serious potential problem).


We are already doing these kind of optimizations in SelectionDAG. The SLPVectorizer is not the right place for this kind of transformation.


================
Comment at: lib/Transforms/IPO/PassManagerBuilder.cpp:290
@@ -289,1 +289,3 @@
       MPM.add(createSLPVectorizerPass());   // Vectorize parallel scalar chains.
+      MPM.add(createAggressiveDCEPass());   // Delete dead instructions
+    }
----------------
The SLP vectorizer should clean after itself. Is it not?

================
Comment at: lib/Transforms/IPO/PassManagerBuilder.cpp:360
@@ -357,2 +359,3 @@
       MPM.add(createSLPVectorizerPass());   // Vectorize parallel scalar chains.
+      MPM.add(createAggressiveDCEPass());   // Delete dead instructions
       if (OptLevel > 1 && ExtraVectorizerPasses) {
----------------
The SLP vectorizer should clean after itself. Is it not?

================
Comment at: lib/Transforms/IPO/PassManagerBuilder.cpp:504
@@ -500,1 +503,3 @@
       PM.add(createSLPVectorizerPass()); // Vectorize parallel scalar chains.
+      PM.add(createAggressiveDCEPass()); // Delete dead instructions
+    }
----------------
Why do we need ADCE here?  the SLP vectorizer should clean up after itself. We already have DCE and CSE built into the SLP-vectorizer. 

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:68
@@ +67,3 @@
+SLPScatter("slp-vectorize-scatter", cl::init(false), cl::Hidden,
+           cl::desc("Attempt to vectorize extract vector sequence"));
+
----------------
Why do we need two flags for insert and extract?  Do you feel like this feature is experimental?

Did you run some performance measurements on the llvm test suite?  Are you seeing any wins?

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3116
@@ -3104,1 +3115,3 @@
     for (auto BB : post_order(&F.getEntryBlock())) {
+      // Combine Insert Element Instructions
+      if (SLPGather)
----------------
This part looks fine.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3203
@@ +3202,3 @@
+  /// values from a load instruction inserting them into constant vector
+  /// elements.
+  unsigned collectInsertElements(BasicBlock *BB);
----------------
What does the function return?

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3209
@@ +3208,3 @@
+
+  bool combineInsertElementChain(ArrayRef<InsertElementInst *> Chain,
+                                 BoUpSLP &R);
----------------
Please document the functions below.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:4092
@@ +4091,3 @@
+
+  // 4- Create vector extend instruction and insert it after the last
+  // insertelement instruction.
----------------
What's going on here?  Why do you need to zext/sext?

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:4121
@@ +4120,3 @@
+
+  // 1 -Create a new zero/sign extend instruction if not yet
+  if (NewExt == nullptr) {
----------------
Same comment as above. Why do you need to zext/sext?

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:4151
@@ +4150,3 @@
+
+bool SLPVectorizer::combineInsertElements(
+    ArrayRef<InsertElementInst *> InsertElementsCandidates, BoUpSLP &R) {
----------------
Is there a restriction on the placement of the insert_element instructions?  Do they need to come from the same basic block?

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:4206
@@ +4205,3 @@
+  }
+
+  std::map<unsigned, unsigned> IndexOccurrence;
----------------
Please add more comments. I don't understand what's going on here.


Repository:
  rL LLVM

http://reviews.llvm.org/D9804





More information about the llvm-commits mailing list