[PATCH] D69998: [MacroFusion] Create the missing artificial edges if there are more than 2 SU fused.

Sun Nov 10 22:21:01 PST 2019

steven.zhang added a comment.

In D69998#1740295 <https://reviews.llvm.org/D69998#1740295>, @arsenm wrote:

> In D69998#1740279 <https://reviews.llvm.org/D69998#1740279>, @steven.zhang wrote:
>
> > In D69998#1739692 <https://reviews.llvm.org/D69998#1739692>, @jsji wrote:
> >
> > > My understanding is that you are doing two fixes in this patch:
> > >
> > > 1. Extend API of `shouldScheduleAdjacent` to `avoid fusing an instruction more than once along the same dependency chain`.
> > > 2. Extend `fuseInstructionPair` to add artificial data dependencies for chained-fusion
> > >
> > > To me, yes, we should do #1 first, as https://reviews.llvm.org/D36704 was already trying to do so.
> >
> >
> > Yes, but still have problems.
> >
> > > #2 might not be necessary, as you mentioned, all existing targets only support `back-to-back` fusion.
> > >  If we want to extend it for `chained-fusion` (3 or more in same dependency chain),
> > >  then I believe we need more changes than adding data dependencies.
> > >  Also need additional tests for some target that support it (maybe RISC-V?)
> >
> > The current implementation has already been implemented to support more than 2 SU's fusion, However, it misses to create some dependency edges. The patch is trying to fix the bug of the macro fusion infrastructure. From my understanding,
> >  adding these missing edges is enough. I didn't see llvm support the macro fusion for RISC-V target. AMDGPU supports more than 2 SU's cluster for load.  @arsenm Would you please help me confirm if AMDGPU target supports more than 2 SU's fusion ?
>
>
> Load/store clustering should produce > 2 sections of clusters, but I don't remember the details of how the DAG mutation is implemented. Specifically for the MacroFusion mutation, I'm not sure. It may be useful to combine one def with multiple uses, but I'm not sure if that actually happens now.

Yeah, from my investigation, the MacroFusion implementation should support it. Do you know the AMDGPU hw supports more than 2 SU's macro fusion as the Load/Store cluster or just as other target, that is back-to-back. I guess it is also back-to-back, but I want to confirm it.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69998/new/

https://reviews.llvm.org/D69998