[PATCH] D116688: [SLP]Excluded external uses from the reprdering estimation.

Wed Jan 5 18:32:31 PST 2022

vporpo added a comment.

I don't fully understand why you are completely removing the cost of external uses. This is modeling the additional extract instructions that will be needed if we decide to proceed with that specific operand order. This is useful when you have to break ties, for example if we to choose between instructions of the same opcode, one with external uses and the other without. In that case we would prefer to vectorize the instructions without uses. Perhaps the current implementation is causing issues because the external cost is always subtracted (line 1233) and is not just used as a tie breaker only when the costs are the same.
If I understand correctly the splat cost is orthogonal to the the external uses? Can't we have both?

Also could you split the MainAlt changes into a separate patch?

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:1143-1144
+    /// \returns The additional cost due to possible broadcasting of the
+    /// elements in the lane. It is more profitable to have power-of-2 unique
+    /// elements in the lane, it will be vectorized with higher probability.
+    int getSplatCost(unsigned Lane, unsigned OpIdx, unsigned Idx) const {
----------------
Please explain in the comments why it is more profitable.

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:1144
+    /// elements in the lane. It is more profitable to have power-of-2 unique
+    /// elements in the lane, it will be vectorized with higher probability.
+    int getSplatCost(unsigned Lane, unsigned OpIdx, unsigned Idx) const {
----------------
Could you also add to the comment what `OpIdx` and `Idx` are.

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:1146
+    int getSplatCost(unsigned Lane, unsigned OpIdx, unsigned Idx) const {
+      Value *V = getData(Idx, Lane).V;
+      if (!isa<Instruction>(V) || V == getData(OpIdx, Lane).V)
----------------
This needs a better name because it gets a bit confusing later on when you introduce `OpV`.  Perhaps call this `OpIdxV` and the other one `IdxV` ?

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:1150
+      SmallPtrSet<Value *, 4> Uniques;
+      for (unsigned I = 0, E = getNumLanes(); I < E; ++I) {
+        if (I == Lane)
----------------
Could you rename `I` to `Ln` ?

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:1159-1161
+      int UniquesCountWithV = UniquesCount;
+      if (!Uniques.contains(V))
+        ++UniquesCountWithV;
----------------
Perhaps replace these lines with:
`int UniquesCountWithV = Uniques.contains(V) ? UniquesCount  : UniquesCount + 1;`

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:1163-1165
+      int UniquesCountWithOpV = UniquesCount;
+      if (!Uniques.contains(OpV))
+        ++UniquesCountWithOpV;
----------------
same

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:1270
+    /// best operand (with the highest score) between the passes.
+    SmallDenseMap<std::pair<unsigned, unsigned>, unsigned, 8>
+        BestScoresPerLanes;
----------------
Could you add to the comment about the what each `unsigned` is in the pair? I think it is operand index and lane.

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:1327
+          int Score = getLookAheadScore(OpLeft, OpRight, MainAltOps);
+          if (Score) {
+            int SplatScore = getSplatCost(Lane, OpIdx, Idx);
----------------
Why skip if score is 0 ?

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:1328
+          if (Score) {
+            int SplatScore = getSplatCost(Lane, OpIdx, Idx);
+            if (Score <= SplatScore)
----------------
Is it Score or Cost? Score usually suggests the higher the better (and Cost the lower the better).

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:3047
+      // Do not include ordering for nodes used in the alt opcode vectorization,
+      // better to reorder them during bottom-to-top stage.
+      SmallVector<const TreeEntry *, 1> Worklist(1, TE.get());
----------------
Why is it better to do this bottom-up ?
Could this be a separate patch?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D116688/new/

https://reviews.llvm.org/D116688