[PATCH] D139338: [IRSim] Check largest sections first when analyzing similarity

Tue Dec 6 10:45:58 PST 2022

paquette added inline comments.

================
Comment at: llvm/include/llvm/Analysis/IRSimilarityIdentifier.h:869
+  /// \p TargetCandLarge.  These IRSimilarityCandidates are already structurally
+  /// similar, ad fully encasulate the IRSimilarityCandidates in question.
+  /// These are used as a "bridge" from the \p SourceCand to the target.
----------------

================
Comment at: llvm/lib/Analysis/IRSimilarityIdentifier.cpp:1126
+    IRSimilarityCandidate &TargetCandLarge) {
+  assert(SourceCand.CanonNumToNumber.size() != 0 &&
+         "Canonical Relationship is non-empty");
----------------
is there a call to `empty()` or something you could use here?

================
Comment at: llvm/lib/Analysis/IRSimilarityIdentifier.cpp:1148
+  // structurally similar.
+  for (std::pair<Value *, unsigned> ValueNumPair : ValueToNumber) {
+    Value *CurrVal = ValueNumPair.first;
----------------
does this need to be a copy? or could you use a reference?

================
Comment at: llvm/lib/Analysis/IRSimilarityIdentifier.cpp:1178
+    if (!OSourceGVN.has_value()) {
+      TargetCandLarge.frontInstruction()->getParent()->getParent()->getParent()->dump();
+      errs() << "------\n";
----------------
remove debug dumping code?

================
Comment at: llvm/lib/Analysis/IRSimilarityIdentifier.cpp:1254
+  DenseMap<unsigned, DenseSet<IRSimilarityCandidate *>>::iterator IdxIt;
+  IdxIt = IndexToIncludedCand.find(CandA.getStartIdx());
+  Optional<std::pair<IRSimilarityCandidate *, IRSimilarityCandidate *>> Result;
----------------
Could this just be `auto IdxIt = ...`?

Also `IdxIt` could probably use a more descriptive name?

================
Comment at: llvm/lib/Analysis/IRSimilarityIdentifier.cpp:1259
+  for (IRSimilarityCandidate *MatchedCand : IdxIt->second) {
+    if (MatchedCand->getStartIdx() > CandA.getStartIdx() ||
+        (MatchedCand->getEndIdx() < CandA.getEndIdx()))
----------------
I think `CandA.getStartIdx()` and `CandA.getEndIdx()` can be hoisted out of the loop into variables.

================
Comment at: llvm/lib/Analysis/IRSimilarityIdentifier.cpp:1273
+  for (IRSimilarityCandidate *MatchedCand : IdxIt->second) {
+    if (MatchedCand->getStartIdx() > CandB.getStartIdx() ||
+        MatchedCand->getEndIdx() < CandB.getEndIdx())
----------------
The start index + end index for `CandB` can be hoisted out of this loop too.

================
Comment at: llvm/lib/Analysis/IRSimilarityIdentifier.cpp:1292
+  DenseMap<unsigned, IRSimilarityCandidate *>::iterator ItA, ItB;
+  ItA = IncludedGroupAndCandA.find(*IncludedGroupsA.begin());
+  ItB = IncludedGroupAndCandB.find(*IncludedGroupsA.begin());
----------------
These could probably both be declared with `auto` too?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139338/new/

https://reviews.llvm.org/D139338