[PATCH] D55851: Implement basic loop fusion pass

Thu Apr 11 12:00:44 PDT 2019

Meinersbur accepted this revision.
Meinersbur added a comment.

Generally, LGTM; with some smaller suggestions.

================
Comment at: llvm/lib/Transforms/Scalar/LoopFuse.cpp:142-143
+
+  FusionCandidate(Loop *L, const DominatorTree *DT,
+                  const PostDominatorTree *PDT)
+      : Preheader(L->getLoopPreheader()), Header(L->getHeader()),
----------------
kbarton wrote:
> Meinersbur wrote:
> > [suggestion] IMHO it is good practice to not have much logic in object constructors. Reasons:
> > 1. When exceptions are thrown, the object is partially initialized
> > 2. With class polymorphism, calling virtual functions will call the non-overidden method.
> > 3. It is not possible to return anything, in this case, an error code.
> > 
> > 1. and 2. do not apply to this code. Point 3. is handled using the `Valid` flag.
> > 
> > Nonetheless, I think having lots of logic in constructor is a bad pattern. Instead, I suggest to have something return nullptr or `llvm::ErrorOr`.
> I understand your points here. I did not realize issue #2 (although I agree it doesn't apply here). 
> 
> However, if I understand your suggestion, I would need to add an initialize method (or something similar) to do the logic that is currently in the constructor and return a success/fail code. The problem is that before anything can be done with the object, you need to ensure that it has been initialized. This can be accomplished by adding a boolean flag (isInitialized) to the class and ensuring that all accessors check that before doing anything with it. However, I think that is more complicated then leaving the initialization logic in the constructor.
> 
> That said, I'm not particularly tied to either approach. I can change the implementation if you feel strongly. Or, if there is an alternative that I'm not seeing, I'm happy to do that instead. 
I was thinking the following pattern:
```
FusionCandidate* makeFusionCandiate(...) {
  BasicBlock *Preheader;
  BasicBlock *Header;
  BasicBlock *ExitingBlocks;
  BasicBlock *ExitBlock;
  BasicBlock *Latch;
  // The above are redundant since llvm::Loop provides accessors  
  // I suggest to not cache them in other objects (they may require being updated) unless it measurable improves performance

  Loop *L;
  SmallVector<Instruction *, 16> MemReads;
  SmallVector<Instruction *, 16> MemWrites;

  if (!condition) {
    LLVM_DEBUG(dbgs() << "The reason\n");
    InvalidationReason++;
    return nullptr;
  }

  return new FusionCandidate(L, MemReads, MemWrites);
}
```

================
Comment at: llvm/lib/Transforms/Scalar/LoopFuse.cpp:142-143
+
+  FusionCandidate(Loop *L, const DominatorTree *DT,
+                  const PostDominatorTree *PDT)
+      : Preheader(L->getLoopPreheader()), Header(L->getHeader()),
----------------
kbarton wrote:
> jdoerfert wrote:
> > Meinersbur wrote:
> > > kbarton wrote:
> > > > Meinersbur wrote:
> > > > > [suggestion] IMHO it is good practice to not have much logic in object constructors. Reasons:
> > > > > 1. When exceptions are thrown, the object is partially initialized
> > > > > 2. With class polymorphism, calling virtual functions will call the non-overidden method.
> > > > > 3. It is not possible to return anything, in this case, an error code.
> > > > > 
> > > > > 1. and 2. do not apply to this code. Point 3. is handled using the `Valid` flag.
> > > > > 
> > > > > Nonetheless, I think having lots of logic in constructor is a bad pattern. Instead, I suggest to have something return nullptr or `llvm::ErrorOr`.
> > > > I understand your points here. I did not realize issue #2 (although I agree it doesn't apply here). 
> > > > 
> > > > However, if I understand your suggestion, I would need to add an initialize method (or something similar) to do the logic that is currently in the constructor and return a success/fail code. The problem is that before anything can be done with the object, you need to ensure that it has been initialized. This can be accomplished by adding a boolean flag (isInitialized) to the class and ensuring that all accessors check that before doing anything with it. However, I think that is more complicated then leaving the initialization logic in the constructor.
> > > > 
> > > > That said, I'm not particularly tied to either approach. I can change the implementation if you feel strongly. Or, if there is an alternative that I'm not seeing, I'm happy to do that instead. 
> > > I was thinking the following pattern:
> > > ```
> > > FusionCandidate* makeFusionCandiate(...) {
> > >   BasicBlock *Preheader;
> > >   BasicBlock *Header;
> > >   BasicBlock *ExitingBlocks;
> > >   BasicBlock *ExitBlock;
> > >   BasicBlock *Latch;
> > >   // The above are redundant since llvm::Loop provides accessors  
> > >   // I suggest to not cache them in other objects (they may require being updated) unless it measurable improves performance
> > > 
> > >   Loop *L;
> > >   SmallVector<Instruction *, 16> MemReads;
> > >   SmallVector<Instruction *, 16> MemWrites;
> > >   
> > >   if (!condition) {
> > >     LLVM_DEBUG(dbgs() << "The reason\n");
> > >     InvalidationReason++;
> > >     return nullptr;
> > >   }
> > > 
> > >   return new FusionCandidate(L, MemReads, MemWrites);
> > > }
> > > ```
> > It is not uncommon in LLVM to do it this way. I personally do not try to force a solution similar to `bool createXXX(...)` but I also do not necessarily object if there is an actual reason.
> OK, so I'm still not sure what the preference is here:
> 
>   # Keep it as is.
>   # Change to create an empty object and then require an Initialize method call on it.
>   # Create a wrapper, createFusionCandidate(), that does the construction and initialization.
>   # Some other alternative that I don't see.
> 
> If I'm going to change, I prefer the createFusionCandidate route (which I hadn't thought of earlier) which I think is the cleanest, although I don't know how to enforce it's usage. 
> 
It is just a suggestion and does not correspond to an LLVM coding policy. Keep as-is, it is not an requirement for me to accept the patch.

================
Comment at: llvm/lib/Transforms/Scalar/LoopFuse.cpp:204
+
+  void dump(raw_ostream &OS) const {
+    OS << "\tPreheader: " << (Preheader ? Preheader->getName() : "nullptr")
----------------
Meinersbur wrote:
> [style] Use `LLVM_DUMP_METHOD` pattern for `dump()` methods (See `compiler.h`)
[style] `dump()` methods are usually guard by
```
#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
```
as mentioned in the comment above `LLVM_DUMP_METHOD`

================
Comment at: llvm/lib/Transforms/Scalar/LoopFuse.cpp:276
+// dominates FC1 and FC1 post-dominates FC0.
+using FusionCandidateSet = std::set<FusionCandidate, FusionCandidateCompare>;
+using FusionCandidateCollection = SmallVector<FusionCandidateSet, 4>;
----------------
kbarton wrote:
> Meinersbur wrote:
> > [remark] Is there a reason to use `std::set`? Usually, the LLVM hash-based sets (`DenseSet`, `SmallPtrSet`, etc) are more performant.
> This is a good question, and I should have added comments here to articulate why I eventually settled on std::set.
> 
> Ultimately, I was looking for a data structure that would keep the fusion candidates sorted automatically, and not invalidate the iterators as the set changes. In this implementation, either DenseSet or SmallPtrSet would be fine, but we have additional patches to extend fusion to move intervening code around and split up loops to create new opportunities. When this is done, the contents of the FusionCandidateSet will change, and having stable iterators simplifies the logic quite a bit (in my opinion). Similarly, always ensuring that the set is ordered properly is essential for this implementation, and that seems to be done reasonably well with std::set. 
> 
> I will add some comments in my next revision to indicate my line of reasoning here. 
[suggestion] IMHO it is a good style to not depend on a particular implementation of ADTs. Code that  advances iterators manually depending on inserted/removed elements can be confusing and may break easily in future changes.

However, as with the previous [suggestion], I am ok with keeping it as-is.

================
Comment at: llvm/lib/Transforms/Scalar/LoopFuse.cpp:977-980
+    TreeUpdates.push_back(
+        {DominatorTree::Delete, FC0.ExitingBlocks, FC1.Preheader});
+    TreeUpdates.push_back(
+        {DominatorTree::Insert, FC0.ExitingBlocks, FC1.Header});
----------------
kbarton wrote:
> Meinersbur wrote:
> > [style] Use `emplace_back`?
> There is no emplace_back for TreeUpdates (unless I misunderstood your suggestion). 
Are you sure? There is `SmallVectorImpl<T>::emplace_back`.

================
Comment at: llvm/lib/Transforms/Scalar/LoopFuse.cpp:292
+                                     const FusionCandidateSet &CandSet) {
+  for (auto IT = CandSet.begin(); IT != CandSet.end(); ++IT)
+    OS << *IT << "\n";
----------------
[suggestion] This could be a range-based for-loop.

================
Comment at: llvm/lib/Transforms/Scalar/LoopFuse.cpp:323
+  LoopDepthTree(LoopInfo &LI) : Depth(1) {
+    if (LI.begin() != LI.end())
+      LoopsOnLevel.emplace_back(LoopVector(LI.rbegin(), LI.rend()));
----------------
[style] Did you consider `LI.empty()`?

================
Comment at: llvm/lib/Transforms/Scalar/LoopFuse.cpp:935
+
+  bool emptyPreheader(const FusionCandidate &FC) const {
+    return FC.Preheader->size() == 1;
----------------
[suggestion] Consider using a method name that makes clear that is has no side-effects. `isEmptyPreheader`?
`emptyPreheader` sounds like it would remove instructions from the preheader until it is empty.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55851/new/

https://reviews.llvm.org/D55851