[PATCH] D70066: [MacroFusion] Limit the max fused number as 2 to reduce the dependency

Tue Nov 26 19:56:35 PST 2019

steven.zhang marked an inline comment as done.
steven.zhang added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64MacroFusion.cpp:381
+  // Only back to back fusion are supported.
+  if (NumFused > 0)
+    return false;
----------------
fhahn wrote:
> steven.zhang wrote:
> > fhahn wrote:
> > > If we make the decision here, I think we should also just count the predecessors here, instead of doing so at the call site. If the sub target does not have a limit or does not require one, we should not iterate over the clusters unnecessarily. 
> > > 
> > > We could add a helper like 'hasLessThanNumFused` in MacroFusion.h which would do something like the function below, to stop when the limit is reached:
> > > 
> > > ```
> > > bool HasLessThanNumFused(const SUnit &SU, unsigned FuseLimit) {
> > >   unsigned Num = 0;
> > >   const SUnit *CurrentSU = &SU;
> > >   while ((CurrentSU = getPredClusterSU(*CurrentSU)) && Num < FuseLimt) Num ++;
> > >   return Num < FuseLimit;
> > > }
> > > ```
> > > 
> > > We would have to pass in the SUnit of course.
> > If I understand correctly, we have to change the interface of shouldScheduleAdjacent that contains the SUnit instead of the MachineInstr. That is really not clean, as the target should be aware of the SUnit, to count the pred. Can we do it another way ?
> > Let's assume that, the macro fusion must be back to back. That is, check the SU numbers using HasLessThanNumFused () in the infrastructure instead of the target hook and bail out if it reach the limit. Does it make sense ?
> > If I understand correctly, we have to change the interface of shouldScheduleAdjacent that contains the SUnit instead of the MachineInstr. That is really not clean, as the target should be aware of the SUnit, to count the pred. Can we do it another way ?
> 
> I am not sure if passing in the SU directly is a bit deal here. Do you have a specific concern?
> 
>  The macro fusion implementations deal specifically with scheduling issues, so IMO passing the SU in directly makes sense to me, so they can make better decisions (instead of passing bits of information computed from the SU). IIRC the main reason we pass in MAchineInstrs instead of SUs is that there was no need for information from the SUs initially. Also, I think there might be cases where better fusion heuristics might want to use additional info from the SU (although the current ones are mostly 'good enough')
> 
> I think it would be great to keep the API flexible here and to not  impose unnecessary checks/computation for some targets.
> 
> > Let's assume that, the macro fusion must be back to back. That is, check the SU numbers using HasLessThanNumFused () in the infrastructure instead of the target hook and bail out if it reach the limit. Does it make sense ?
> 
> With the current structure, all fusion related flags are controlled by the sub target, which is the right level of granularity IMO. I think it would be better to not add additional fusion related flags to TTI, if that's what you mean with infrastructure.
Oh, sorry about the confusion. The infrastructure I mean is inside the MacroFusion implementation, not TTI. i.e.
```
bool MacroFusion::schedueAdjacentImpl(...) {
   ...
   if (!HasLessThanNumFused(DepSU) || ShouldScheduleAdjacent(...))
      continue;
}
```
The target is unaware of this, so that, we don't need to provide a helper function or pass the SUnit.  And we are basing on all the macro-fusion is implemented by hw as back-to-back. And it will benefit all the targets if they add some pattern that could fuse together in case.

 However, I am also ok with your suggestion, as it also makes sense. 

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70066/new/

https://reviews.llvm.org/D70066