[PATCH] D45543: [globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64

Tue May 1 10:56:00 PDT 2018

rtereshin added inline comments.

================
Comment at: lib/CodeGen/GlobalISel/CombinerHelper.cpp:54
+    unsigned PtrReg = DefMI->getOperand(1).getReg();
+    MachineMemOperand &MMO = **DefMI->memoperands_begin();
+    DEBUG(dbgs() << ".. Combine MI: " << MI;);
----------------
dsanders wrote:
> rtereshin wrote:
> > Is it possible to have a memory operand missing here?
> > Also, if not, does MachineVerifier enforce it?
> > Is it possible to have a memory operand missing here?
> 
> No, several parts of GlobalISel require it
> 
> > Also, if not, does MachineVerifier enforce it?
> 
> Yes, it reports 'Generic instruction accessing memory must have one mem operand' if it's not the case
Cool, thanks!

================
Comment at: lib/CodeGen/GlobalISel/CombinerHelper.cpp:63
+                           DstReg, PtrReg, MMO);
+    MI.eraseFromParent();
+    return true;
----------------
dsanders wrote:
> rtereshin wrote:
> > `getOpcodeDef` is able to look through copies, therefore I'd expect this combiner to match the following sequence:
> > ```
> > %v1:_(s16) = G_LOAD %ptr(p0), (load 2)
> > %v2:_(s16) = COPY %v1(s16)
> > %v3:_(s32) = G_ZEXT %v2(s16)
> > ```
> > and produce the following output:
> > ```
> > %v1:_(s16) = G_LOAD %ptr(p0), (load 2)
> > %v2:_(s16) = COPY %v1(s16)
> > %v3:_(s32) = G_ZEXTLOAD %ptr(p0), (load 2)
> > ```
> > Do you think it's a good idea to add tests like this and control that this, in fact, actually happens and happens correctly?
> That makes sense to me. We won't want to go overboard with that kind of thing though, it's enough to check it in a couple combines
Agreed.

================
Comment at: lib/CodeGen/GlobalISel/CombinerHelper.cpp:70
 bool CombinerHelper::tryCombine(MachineInstr &MI) {
   return tryCombineCopy(MI);
 }
----------------
dsanders wrote:
> aditya_nandakumar wrote:
> > rtereshin wrote:
> > > This is clearly supposed to try all of the combines implemented in the helper:
> > > ```
> > >   /// If \p MI is extend that consumes the result of a load, try to combine it.
> > >   /// Returns true if MI changed.
> > >   bool tryCombineExtendingLoads(MachineInstr &MI);
> > > 
> > >   /// Try to transform \p MI by using all of the above
> > >   /// combine functions. Returns true if changed.
> > >   bool tryCombine(MachineInstr &MI);
> > > };
> > > ```
> > I suspect he's put the calls to tryCombineExtendingLoads in AArch pass as other passes may not handle the extending load opcodes correctly in the legalizer. If/when they're ready, then it makes sense to move them into the generic helper.
> They'll combine correctly but the legalizer will immediately decompose them again on most targets at the moment so it's just wasted effort slowing the compile-times. This brings up something I've been wondering how to deal with over the last few days. If we continue down the path we're going, I currently don't see how we're going to manage large collections of combines and several targets.
> 
> Suppose the Foo and Bar targets supports the following combines:
>   tryCombine
>     tryCombineGroup1
>       tryA
>       tryB
>       tryC
>     tryCombineGroup2
>       tryD
>       tryE
>       tryF
> and each calls tryCombine. Now suppose Bar realizes that tryE is doing more harm than good.
> 
> We could make tryE check that the target isn't Bar. The consequence of that is that we have a monolithic implementation covering all targets. It means a target specific change can introduce all-target bugs, all targets must carry everyones complexity, every change needs testing (including performance) for every target, etc.
> 
> We could make Bar call tryCombineGroup1, tryD, and tryF instead. The consequence of that is that useful changes to tryCombine, and tryCombineGroup2 won't apply to Bar. Everyone will have to review combines and choose to enable them and as such won't benefit from improvements. They also won't suffer from losses either which is a good thing but I would hope that those are less common.
> 
> We could split tryCombineGroup2 to get (this is just one example):
>   tryCombine
>     tryCombineGroup1
>       tryA
>       tryB
>       tryC
>     tryCombineGroup2
>       tryD
>       tryF
>     tryCombineGroup3
>       tryE
> this of course assumes that ordering between E and F doesn't matter. Of course, if we do that enough then we eventually reach:
>     tryA
>     tryB
>     tryC
>     tryD
>     tryF
>     tryE
> bringing with it the same problems as the previous option.
> 
> The answer I keep returning to at the moment is that even if the majority of our implementation is C++, we need something declarative to organize and control the combines for each target. We don't necessarily need to go to my original intent of re-using the ISel matcher for combines and thereby doing the majority of the matching in tablegenerated code but we do need a declarative way to control whether a combine is enabled/disabled (which is trivial) and the order they're applied (not so trivial).
I like the idea with groups. I think it will make it easier for target-writers (and not only human target-writers ;-) ) to compose a reasonable pipeline for their needs relatively easily. Also, the grouping doesn't have to be 1-level, it could be more, if well-structured and useful.

It may worth contemplating though what principle should be used to group combines together. I think it's going to be tempting to group them by root-level opcode: all G_ADD combines, all arithmetic combines (2-nd level group), etc, but this kind of grouping has a chance to prove less useful than architecture / micro-architecture targeted grouping. Like "combines for super-scalar architectures with a lot of instruction-level parallelism", and "architectures with highly efficient SIMD units" etc, something driven by what actual targets use and ignore.

As for the declarative approach, honestly, putting as much as possible into Tablegen doesn't strike me like a wise approach. Tablegen'erated implementations are hard to read and search, interfaces between them and the rest of the compiler seem to be obscure and fragile, and it's very hard to make changes to any of it. If it's possible to derive the implementation from the information **already** provided by target writers, extending a Tablegen-backend  at least goes into "let's auto-generate the entire compiler" direction, which is valuable. For instance, if we could derive the optimal pipeline of combines from scheduling and instruction info already put together by the targets. If it's needed, however, to come up with a new set of Tablegen-classes to explicitly define that pipeline manually in a *.td-file, and write an entirely new Tablegen-backend to process it, that doesn't look like a valuable thing to do.

We could be as declarative as we want to be in C++ and that may be much easier to work with.

Also, I think it may be valuable to make whatever design we eventually come up with easily compatible with a hypothetical (at the moment) tool that would be able to generate an optimal combine-pipeline for a target automatically, provided a performance feedback mechanism. Let it start with something reasonable, compile a corpus of programs, evaluate the code quality via that feedback mechanism, mutate the pipeline a bit, and try again.

That sort of thing calls for a separate binary tool having a wide access to LLVM infrastructure, including all the combines, not for a Table-gen backend. I gave up on making Testgen a part of the Tablegen process very quickly, for instance.

================
Comment at: lib/Target/AArch64/AArch64PreLegalizerCombiner.cpp:49
+    return Helper.tryCombineExtendingLoads(MI);
+  }
+
----------------
dsanders wrote:
> rtereshin wrote:
> > `CombinerHelper::tryCombineCopy` contains a bug (it doesn't check that register classes and register banks of its source and destination are compatible ), as soon as it's fixed* **and** `CombinerHelper::tryCombine` properly tries all of the combines implemented in `CombinerHelper` as it's supposed to, this could be replaced with just a single call to `CombinerHelper::tryCombine`, not before, though.
> > 
> > *I'm planning on adding a patch with that fix soon, but it will be dependent on https://reviews.llvm.org/D45732 as the latter makes the former simpler and shorter.
> > 
> > + @aditya_nandakumar 
> > and CombinerHelper::tryCombine properly tries all of the combines implemented in CombinerHelper as it's supposed to
> 
> At the moment, that will look like a good idea but having a single monolithic tryCombine() isn't going to last long as more combines get implemented and more targets use combines. (see above)
Something still needs to be done about `tryCombine`. If it doesn't represent any practically useful group used by any target (not necessarily in final implementation, during experimentation too), let's remove it. if it does, let's maintain it so it does precisely what it promises to do.

Repository:
  rL LLVM

https://reviews.llvm.org/D45543