[PATCH] D55851: Implement basic loop fusion pass

Tue Apr 16 10:41:46 PDT 2019

kbarton marked 24 inline comments as done.
kbarton added a comment.

All comments have been addressed. I will be committing this soon.

================
Comment at: llvm/lib/Transforms/Scalar/LoopFuse.cpp:142-143
+
+  FusionCandidate(Loop *L, const DominatorTree *DT,
+                  const PostDominatorTree *PDT)
+      : Preheader(L->getLoopPreheader()), Header(L->getHeader()),
----------------
Meinersbur wrote:
> kbarton wrote:
> > jdoerfert wrote:
> > > Meinersbur wrote:
> > > > kbarton wrote:
> > > > > Meinersbur wrote:
> > > > > > [suggestion] IMHO it is good practice to not have much logic in object constructors. Reasons:
> > > > > > 1. When exceptions are thrown, the object is partially initialized
> > > > > > 2. With class polymorphism, calling virtual functions will call the non-overidden method.
> > > > > > 3. It is not possible to return anything, in this case, an error code.
> > > > > > 
> > > > > > 1. and 2. do not apply to this code. Point 3. is handled using the `Valid` flag.
> > > > > > 
> > > > > > Nonetheless, I think having lots of logic in constructor is a bad pattern. Instead, I suggest to have something return nullptr or `llvm::ErrorOr`.
> > > > > I understand your points here. I did not realize issue #2 (although I agree it doesn't apply here). 
> > > > > 
> > > > > However, if I understand your suggestion, I would need to add an initialize method (or something similar) to do the logic that is currently in the constructor and return a success/fail code. The problem is that before anything can be done with the object, you need to ensure that it has been initialized. This can be accomplished by adding a boolean flag (isInitialized) to the class and ensuring that all accessors check that before doing anything with it. However, I think that is more complicated then leaving the initialization logic in the constructor.
> > > > > 
> > > > > That said, I'm not particularly tied to either approach. I can change the implementation if you feel strongly. Or, if there is an alternative that I'm not seeing, I'm happy to do that instead. 
> > > > I was thinking the following pattern:
> > > > ```
> > > > FusionCandidate* makeFusionCandiate(...) {
> > > >   BasicBlock *Preheader;
> > > >   BasicBlock *Header;
> > > >   BasicBlock *ExitingBlocks;
> > > >   BasicBlock *ExitBlock;
> > > >   BasicBlock *Latch;
> > > >   // The above are redundant since llvm::Loop provides accessors  
> > > >   // I suggest to not cache them in other objects (they may require being updated) unless it measurable improves performance
> > > > 
> > > >   Loop *L;
> > > >   SmallVector<Instruction *, 16> MemReads;
> > > >   SmallVector<Instruction *, 16> MemWrites;
> > > >   
> > > >   if (!condition) {
> > > >     LLVM_DEBUG(dbgs() << "The reason\n");
> > > >     InvalidationReason++;
> > > >     return nullptr;
> > > >   }
> > > > 
> > > >   return new FusionCandidate(L, MemReads, MemWrites);
> > > > }
> > > > ```
> > > It is not uncommon in LLVM to do it this way. I personally do not try to force a solution similar to `bool createXXX(...)` but I also do not necessarily object if there is an actual reason.
> > OK, so I'm still not sure what the preference is here:
> > 
> >   # Keep it as is.
> >   # Change to create an empty object and then require an Initialize method call on it.
> >   # Create a wrapper, createFusionCandidate(), that does the construction and initialization.
> >   # Some other alternative that I don't see.
> > 
> > If I'm going to change, I prefer the createFusionCandidate route (which I hadn't thought of earlier) which I think is the cleanest, although I don't know how to enforce it's usage. 
> > 
> It is just a suggestion and does not correspond to an LLVM coding policy. Keep as-is, it is not an requirement for me to accept the patch.
I will mark this conversation as done as I think we've agreed the existing implementation is OK.  We can refactor/redo in the future if this becomes too cumbersome. 

================
Comment at: llvm/lib/Transforms/Scalar/LoopFuse.cpp:276
+// dominates FC1 and FC1 post-dominates FC0.
+using FusionCandidateSet = std::set<FusionCandidate, FusionCandidateCompare>;
+using FusionCandidateCollection = SmallVector<FusionCandidateSet, 4>;
----------------
Meinersbur wrote:
> kbarton wrote:
> > Meinersbur wrote:
> > > [remark] Is there a reason to use `std::set`? Usually, the LLVM hash-based sets (`DenseSet`, `SmallPtrSet`, etc) are more performant.
> > This is a good question, and I should have added comments here to articulate why I eventually settled on std::set.
> > 
> > Ultimately, I was looking for a data structure that would keep the fusion candidates sorted automatically, and not invalidate the iterators as the set changes. In this implementation, either DenseSet or SmallPtrSet would be fine, but we have additional patches to extend fusion to move intervening code around and split up loops to create new opportunities. When this is done, the contents of the FusionCandidateSet will change, and having stable iterators simplifies the logic quite a bit (in my opinion). Similarly, always ensuring that the set is ordered properly is essential for this implementation, and that seems to be done reasonably well with std::set. 
> > 
> > I will add some comments in my next revision to indicate my line of reasoning here. 
> [suggestion] IMHO it is a good style to not depend on a particular implementation of ADTs. Code that  advances iterators manually depending on inserted/removed elements can be confusing and may break easily in future changes.
> 
> However, as with the previous [suggestion], I am ok with keeping it as-is.
Thanks. I will mark this as done for now. 
This is also something that may need to get revisited as the code evolves. 

================
Comment at: llvm/lib/Transforms/Scalar/LoopFuse.cpp:977-980
+    TreeUpdates.push_back(
+        {DominatorTree::Delete, FC0.ExitingBlocks, FC1.Preheader});
+    TreeUpdates.push_back(
+        {DominatorTree::Insert, FC0.ExitingBlocks, FC1.Header});
----------------
Meinersbur wrote:
> kbarton wrote:
> > Meinersbur wrote:
> > > [style] Use `emplace_back`?
> > There is no emplace_back for TreeUpdates (unless I misunderstood your suggestion). 
> Are you sure? There is `SmallVectorImpl<T>::emplace_back`.
You're right. The syntax I was using was wrong. I've fixed it and changed to using emplace_back. 

================
Comment at: llvm/lib/Transforms/Scalar/LoopFuse.cpp:323
+  LoopDepthTree(LoopInfo &LI) : Depth(1) {
+    if (LI.begin() != LI.end())
+      LoopsOnLevel.emplace_back(LoopVector(LI.rbegin(), LI.rend()));
----------------
Meinersbur wrote:
> [style] Did you consider `LI.empty()`?
I don't think so.
I've changed it, as I think it makes it more readable. 

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55851/new/

https://reviews.llvm.org/D55851