[PATCH] D70066: [MacroFusion] Limit the max fused number as 2 to reduce the dependency

Florian Hahn via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Nov 25 04:41:37 PST 2019


fhahn added a comment.

In D70066#1758249 <https://reviews.llvm.org/D70066#1758249>, @steven.zhang wrote:

> In D70066#1756949 <https://reviews.llvm.org/D70066#1756949>, @fhahn wrote:
>
> > I am not sure if this example best illustrates the issue. Is your main concern here potentially longer dependency chains?
>
>
> My main concern is that, some node (such as Node A in my below example) lose some freedom to do the schedule.


That's what I thought, thanks for clarifying.

>> Alternatively, are there similar cases for other targets?
> 
> No, as far as I know currently. But no guarantee for the future, as hw is moving forward you know.

Sure, but then someone has to add new fusion patterns to the matcher, otherwise the case can not happen (as I said, I am not what targets beside AArch64 match) and it is not possible to add a test case.

I would recommend moving the logic to count the number of fused instructions into the target specific shouldScheduleAdjacent as suggested inline. On targets that have no problematic patterns, we can just assert that we never fuse an instruction with one that's already fused. So once we have problematic patterns there, we know because of the assertion. For targets like AArch64 with problematic patterns, we can add the bail out based on the number of fused instructions. It probably makes sense to add the limit as sub-target feature. It would also be good to put up patches for each target separately, as different people might be interested in taking a look.

Thanks for working on this! With the suggestions, I think this would be good to go in, independently of D69998 <https://reviews.llvm.org/D69998>



================
Comment at: llvm/lib/Target/AArch64/AArch64MacroFusion.cpp:381
+  // Only back to back fusion are supported.
+  if (NumFused > 0)
+    return false;
----------------
If we make the decision here, I think we should also just count the predecessors here, instead of doing so at the call site. If the sub target does not have a limit or does not require one, we should not iterate over the clusters unnecessarily. 

We could add a helper like 'hasLessThanNumFused` in MacroFusion.h which would do something like the function below, to stop when the limit is reached:

```
bool HasLessThanNumFused(const SUnit &SU, unsigned FuseLimit) {
  unsigned Num = 0;
  const SUnit *CurrentSU = &SU;
  while ((CurrentSU = getPredClusterSU(*CurrentSU)) && Num < FuseLimt) Num ++;
  return Num < FuseLimit;
}
```

We would have to pass in the SUnit of course.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70066/new/

https://reviews.llvm.org/D70066





More information about the llvm-commits mailing list