[PATCH] D141512: [SLP]Improve isGatherShuffledEntry by looking deeper through the reused scalars.

Fri Jan 13 10:23:32 PST 2023

vdmitrie added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:8113
+  SmallPtrSet<Value *, 4> GatheredScalars(VL.begin(), VL.end());
+  auto CheckOrdering = [&](Instruction *LastEI) {
+    // Check if the user node of the TE comes after user node of EntryPtr,
----------------
please describe

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:8306-8307
+    auto *PHI1 = cast<PHINode>(V1);
+    // Check that all incoming values are compatible/from same parent (if they
+    // are instructions).
+    for (int I = 0, E = PHI->getNumIncomingValues(); I < E; ++I) {
----------------
IMO, the comment is not much helpful unless "compatible" is described.
We check for specific properties: incoming values are either both constants (undef and/or poison is probably considered a constant as well) or instructions with same opcode having same parent block.

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:8321
+  };
+  auto MightBeIgnored = [=](Value *V) {
+    auto *I = dyn_cast<Instruction>(V);
----------------
please add some description

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:8326
+      IgnoredVals.assign(UserIgnoreList->begin(), UserIgnoreList->end());
+    return I && !IsSplat && !ScalarToTreeEntry.count(I) &&
+           !isVectorLikeInstWithConstOps(I) &&
----------------
Please comment the code?

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:8330
+  };
+  auto NeighborMightBeIgnored = [&](Value *V, int Idx) {
+    Value *V1 = VL[Idx];
----------------
please add some description

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:8336
+      UsedInSameVTE = It->second == UsedValuesEntry.find(V)->second;
+    return V != V1 && MightBeIgnored(V1) && !UsedInSameVTE &&
+           getSameOpcode({V, V1}, *TLI).getOpcode() &&
----------------
Please comment the code?

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:8361-8380
+  SmallVector<const TreeEntry *> TempEntries;
+  for (unsigned I = 0, Sz = Entries.size(); I < Sz; ++I) {
+    if (!UsedIdxs.test(I))
+      continue;
+    for (std::pair<unsigned, int> &Pair : EntryLanes)
+      if (Pair.first == I)
+        Pair.first = TempEntries.size();
----------------
Could you please add some comments describing the code?

================
Comment at: llvm/test/Transforms/SLPVectorizer/X86/commutativity.ll:78-87
+; SSE-NEXT:    [[TMP1:%.*]] = insertelement <4 x i32> poison, i32 [[C:%.*]], i32 0
+; SSE-NEXT:    [[TMP2:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> zeroinitializer
+; SSE-NEXT:    [[TMP3:%.*]] = insertelement <4 x i32> poison, i32 [[A:%.*]], i32 0
+; SSE-NEXT:    [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> poison, <4 x i32> zeroinitializer
+; SSE-NEXT:    [[TMP5:%.*]] = add <4 x i32> [[TMP2]], [[TMP4]]
+; SSE-NEXT:    [[TMP6:%.*]] = insertelement <4 x i32> [[TMP3]], i32 [[B:%.*]], i32 1
+; SSE-NEXT:    [[TMP7:%.*]] = insertelement <4 x i32> [[TMP6]], i32 [[C]], i32 2
----------------
Why this test changed behavior?
My expectation would be the original scalar code is likely noticeably faster than vectorized for this case.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141512/new/

https://reviews.llvm.org/D141512