[PATCH] D55851: Implement basic loop fusion pass

Dave Green via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sun Mar 24 07:40:08 PDT 2019


dmgreen added inline comments.


================
Comment at: llvm/lib/Transforms/Scalar/LoopFuse.cpp:129
+  BasicBlock *Header;
+  BasicBlock *ExitingBlocks;
+  BasicBlock *ExitBlock;
----------------
Perhaps name it ExitingBlock, if there is only one.


================
Comment at: llvm/lib/Transforms/Scalar/LoopFuse.cpp:929
+          // defined in that loop, and we likely don't want to disturb it.
+          PHINode *DefPHI = dyn_cast<PHINode>(Def);
+          if (DefPHI)
----------------
kbarton wrote:
> dmgreen wrote:
> > I think that you can just do something like
> >   if (Instruction *Def = dyn_cast<Instruction>(Op))
> >     if (FC0.L->contains(Def->getParent()))
> >       return false;
> > Providing there are no lcssa phis, this should rule out anything that depends on the first loop. 
> I just finished up testing this change, and then realized I think this is overly restrictive.
> I had originally restricted it to PHIs because I only wanted to include things that are (re)defined in the first loop. Basically, the first loop needs to complete in order to get the correct definition going into the second loop.
> 
> This alternate sequence will also include stores, not just PHIs. The store can be loop invariant, and if it is not hoisted out of the loop prior, then it would also prevent fusion.
This is the kind of thing I was thinking of:

```
define float @test(float* nocapture %a, i32 %n) {
entry:
  %conv = zext i32 %n to i64
  %cmp32 = icmp eq i32 %n, 0
  br i1 %cmp32, label %for.cond.cleanup7, label %for.body

for.body:                                         ; preds = %for.body, %entry
  %i.034 = phi i64 [ %inc, %for.body ], [ 0, %entry ]
  %sum1.033 = phi float [ %add, %for.body ], [ 0.000000e+00, %entry ]
  %idxprom = trunc i64 %i.034 to i32
  %arrayidx = getelementptr inbounds float, float* %a, i32 %idxprom
  %0 = load float, float* %arrayidx, align 4
  %add = fadd float %sum1.033, %0
  %inc = add nuw nsw i64 %i.034, 1
  %cmp = icmp ult i64 %inc, %conv
  br i1 %cmp, label %for.body, label %for.body8

for.body8:                                        ; preds = %for.body, %for.body8
  %i2.031 = phi i64 [ %inc14, %for.body8 ], [ 0, %for.body ]
  %idxprom9 = trunc i64 %i2.031 to i32
  %arrayidx10 = getelementptr inbounds float, float* %a, i32 %idxprom9
  %1 = load float, float* %arrayidx10, align 4
  %div = fdiv float %1, %add
  store float %div, float* %arrayidx10, align 4
  %inc14 = add nuw nsw i64 %i2.031, 1
  %cmp5 = icmp ult i64 %inc14, %conv
  br i1 %cmp5, label %for.body8, label %for.cond.cleanup7

for.cond.cleanup7:                                ; preds = %for.body8, %entry
  %sum1.0.lcssa36 = phi float [ 0.000000e+00, %entry ], [ %add, %for.body8 ]
  ret float %sum1.0.lcssa36
}
```

The importand part being %add, that is defined in the first loop and used in the second. I believe that using normal SSA def-use chains, anything that is def'd in the first loop (and used in the second) will need to use the value from the last iteration of the loop, so fusion will be illegal.

There may be cases where a def has the same value on every iteration of the loop, but I imagine those would already have been hoisted/sunk, or be quite rare.


================
Comment at: llvm/lib/Transforms/Scalar/LoopFuse.cpp:1083
+    // Update DT/PDT
+    DTU.applyUpdates(TreeUpdates);
+
----------------
Whilst checking the example, I noticed the dom tree updater has become stricter about "inbalanced" tree operations. I think that if FC0.ExitingBlocks == FC0.Latch, we add a link to FC1.Header twice, which is maybe what it's complaining about (?)


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55851/new/

https://reviews.llvm.org/D55851





More information about the llvm-commits mailing list