[PATCH] D55851: Implement basic loop fusion pass

Wed Jan 23 08:42:18 PST 2019

dmgreen added inline comments.

================
Comment at: llvm/lib/Transforms/Scalar/LoopFuse.cpp:858
+
+      // TODO: Can we actually use the dependence info analysis here?
+      return false;
----------------
kbarton wrote:
> dmgreen wrote:
> > I would imagine, although I'm not sure, that there would at least be a lot of bugs here. We are dealing with different loops, but we can say that they are very similar.
> > 
> > What does it currently give? Anything useful? The SCEV checks you are doing above is obviously quite similar to what DA would be trying to do, but with the added loop rewrite. It would be a shame to duplicate all the effort but may, I guess, be necessary if DA doesn't do well enough.
> At this point DA doesn't give anything useful - at least not for the test cases that I have tried. I have not had a chance to investigate why, or if there is a better/different way to do things where it can be useful (which is why I marked the TODO here). 
> 
> The one thing that DA is able to do, that SCEV currently cannot, is understand the restrict keyword and accurately identify no dependencies between the two loops in this case. Perhaps it would be better to try and teach SCEV about restrict, and then only rely on SCEV in the long run?
> 
I remember DA having problems on 64bit systems due to all the sexts that kept happening. Delinearising constant's too, maybe? There are certainly problems in it that would be worth trying to fix (or find a replacement to do it better).

The delinearisation that DA will attempt may also be helpful. It is nowadays all built on SCEV's too, so should in theory (baring the facts that we know here about dealing with different but similar loops) be a more powerful version of the SCEV code here. That power might be confusing it at the moment though.

I would expect that better DA is something llvm will need sooner or later as more loop optimisations are implemented. Perhaps this is something that we can improve in the future, relying on the SCEV code at the moment, but getting more help from DA as it is improved. The no-aliasing checks it can do are useful at least, it would seem.

================
Comment at: llvm/lib/Transforms/Scalar/LoopFuse.cpp:1003
+                                                          FC1.Header);
+    TreeUpdates.push_back(
+        {DominatorTree::Delete, FC0.ExitingBlocks, FC1.Preheader});
----------------
kbarton wrote:
> dmgreen wrote:
> > It can sometimes be better to insert edges into the DT before deleting old ones. It keep the tree more intact (especially pdts), with less subtrees becoming unreachable and less recalculations needed. It means it can be simpler and quicker for the updates.
> OK, that's good to know, thanks.
> 
> Is this specifically for inserting/deleting edges between similar blocks, or is this for all inserts/deletes in the entire tree? In other words, should I rearrange this code, and the code below (lines 1048-1051) that updates the latch blocks to do all the insertions before any deletions?
I think it's probably fine, just something I've run into in the past with PDT's. If it's something we run into, we can fix it then. And if it does become an issue, it may be better to teach the DTU to do this itself.

================
Comment at: llvm/lib/Transforms/Scalar/LoopFuse.cpp:529
+
+    if (FC.Preheader->size() != 1) {
+      LLVM_DEBUG(dbgs() << "Loop " << FC.L->getName()
----------------
Is there reason in the current version to prevent fusion if "FC0" has instructions in its preheaders?

================
Comment at: llvm/lib/Transforms/Scalar/LoopFuse.cpp:917
+          return false;
+        }
+
----------------
I'm not sure if this is quite strong enough. Consider something like this, where the sum would be used in the second block, but not as phi:
  for(i = 0; i < n; i++)
    sum += a[i];
  for(i = 0; i < n; i++)
    b[i] = a[i]/sum;
These can be fused, but use the wrong "sum" on each iteration of the loop. Unroll and jam side-steps all this by requiring lcssa nodes (and, you know, not requiring generalised loop fusion :) )

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55851/new/

https://reviews.llvm.org/D55851