[PATCH] D45543: [globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64

Tue May 8 08:33:53 PDT 2018

aemerson added inline comments.

================
Comment at: lib/CodeGen/GlobalISel/CombinerHelper.cpp:70
 bool CombinerHelper::tryCombine(MachineInstr &MI) {
   return tryCombineCopy(MI);
 }
----------------
rtereshin wrote:
> dsanders wrote:
> > aditya_nandakumar wrote:
> > > rtereshin wrote:
> > > > This is clearly supposed to try all of the combines implemented in the helper:
> > > > ```
> > > >   /// If \p MI is extend that consumes the result of a load, try to combine it.
> > > >   /// Returns true if MI changed.
> > > >   bool tryCombineExtendingLoads(MachineInstr &MI);
> > > > 
> > > >   /// Try to transform \p MI by using all of the above
> > > >   /// combine functions. Returns true if changed.
> > > >   bool tryCombine(MachineInstr &MI);
> > > > };
> > > > ```
> > > I suspect he's put the calls to tryCombineExtendingLoads in AArch pass as other passes may not handle the extending load opcodes correctly in the legalizer. If/when they're ready, then it makes sense to move them into the generic helper.
> > They'll combine correctly but the legalizer will immediately decompose them again on most targets at the moment so it's just wasted effort slowing the compile-times. This brings up something I've been wondering how to deal with over the last few days. If we continue down the path we're going, I currently don't see how we're going to manage large collections of combines and several targets.
> > 
> > Suppose the Foo and Bar targets supports the following combines:
> >   tryCombine
> >     tryCombineGroup1
> >       tryA
> >       tryB
> >       tryC
> >     tryCombineGroup2
> >       tryD
> >       tryE
> >       tryF
> > and each calls tryCombine. Now suppose Bar realizes that tryE is doing more harm than good.
> > 
> > We could make tryE check that the target isn't Bar. The consequence of that is that we have a monolithic implementation covering all targets. It means a target specific change can introduce all-target bugs, all targets must carry everyones complexity, every change needs testing (including performance) for every target, etc.
> > 
> > We could make Bar call tryCombineGroup1, tryD, and tryF instead. The consequence of that is that useful changes to tryCombine, and tryCombineGroup2 won't apply to Bar. Everyone will have to review combines and choose to enable them and as such won't benefit from improvements. They also won't suffer from losses either which is a good thing but I would hope that those are less common.
> > 
> > We could split tryCombineGroup2 to get (this is just one example):
> >   tryCombine
> >     tryCombineGroup1
> >       tryA
> >       tryB
> >       tryC
> >     tryCombineGroup2
> >       tryD
> >       tryF
> >     tryCombineGroup3
> >       tryE
> > this of course assumes that ordering between E and F doesn't matter. Of course, if we do that enough then we eventually reach:
> >     tryA
> >     tryB
> >     tryC
> >     tryD
> >     tryF
> >     tryE
> > bringing with it the same problems as the previous option.
> > 
> > The answer I keep returning to at the moment is that even if the majority of our implementation is C++, we need something declarative to organize and control the combines for each target. We don't necessarily need to go to my original intent of re-using the ISel matcher for combines and thereby doing the majority of the matching in tablegenerated code but we do need a declarative way to control whether a combine is enabled/disabled (which is trivial) and the order they're applied (not so trivial).
> I like the idea with groups. I think it will make it easier for target-writers (and not only human target-writers ;-) ) to compose a reasonable pipeline for their needs relatively easily. Also, the grouping doesn't have to be 1-level, it could be more, if well-structured and useful.
> 
> It may worth contemplating though what principle should be used to group combines together. I think it's going to be tempting to group them by root-level opcode: all G_ADD combines, all arithmetic combines (2-nd level group), etc, but this kind of grouping has a chance to prove less useful than architecture / micro-architecture targeted grouping. Like "combines for super-scalar architectures with a lot of instruction-level parallelism", and "architectures with highly efficient SIMD units" etc, something driven by what actual targets use and ignore.
> 
> As for the declarative approach, honestly, putting as much as possible into Tablegen doesn't strike me like a wise approach. Tablegen'erated implementations are hard to read and search, interfaces between them and the rest of the compiler seem to be obscure and fragile, and it's very hard to make changes to any of it. If it's possible to derive the implementation from the information **already** provided by target writers, extending a Tablegen-backend  at least goes into "let's auto-generate the entire compiler" direction, which is valuable. For instance, if we could derive the optimal pipeline of combines from scheduling and instruction info already put together by the targets. If it's needed, however, to come up with a new set of Tablegen-classes to explicitly define that pipeline manually in a *.td-file, and write an entirely new Tablegen-backend to process it, that doesn't look like a valuable thing to do.
> 
> We could be as declarative as we want to be in C++ and that may be much easier to work with.
> 
> Also, I think it may be valuable to make whatever design we eventually come up with easily compatible with a hypothetical (at the moment) tool that would be able to generate an optimal combine-pipeline for a target automatically, provided a performance feedback mechanism. Let it start with something reasonable, compile a corpus of programs, evaluate the code quality via that feedback mechanism, mutate the pipeline a bit, and try again.
> 
> That sort of thing calls for a separate binary tool having a wide access to LLVM infrastructure, including all the combines, not for a Table-gen backend. I gave up on making Testgen a part of the Tablegen process very quickly, for instance.
All good points from you and Daniel. I don't know if tablegen is the best tool for the combiners, as I worry we'll paint ourselves into a corner once we go down that path.

I think it would be worth exploring the design of combiner groups, and higher level schemes expressed in a declarative way as Daniel wants. The schemes could encapsulate the choices a target makes about their combiner configuration, with a default one that most targets would use at least at the beginning. Where customisability comes into it is with something like predicates/masks that enable/disable the activation of specific combines in the groups, so a scheme is a set of groups or existing scheme with overlaid modifications. For altering the orders of combines, an overlay could express the reversal of two given combines within a specific group.

Repository:
  rL LLVM

https://reviews.llvm.org/D45543