[LLVMdev] [RFC] Bundling support in the PostRA Scheduler

Mon Aug 13 08:32:16 PDT 2012

> Sergei, are you working on some resource priority queue at
> MI level?

Yes.

Sergei

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum.

> -----Original Message-----
> From: Ivan Llopard [mailto:ivanllopard at gmail.com]
> Sent: Monday, August 13, 2012 5:07 AM
> To: Andrew Trick
> Cc: LLVM Developers Mailing List; Sergei Larin;
> pekka.jaaskelainen at tut.fi; jordy.potman at recoresystems.com;
> thomas.stellard at amd.com
> Subject: Re: [LLVMdev] [RFC] Bundling support in the PostRA Scheduler
> 
> Hi all,
> 
> Thanks for your feed-backs :-)
> 
> @Andrew: In fact, I've reused most of the postRA list scheduler code
> and the resource priority queue. Every time it needs to move forward,
> either because of a res hazard (HazardRec) or an invalid combination of
> instructions in the current packet (DFA), it closes the current bundle
> and advances to the next cycle. The non-interlocked nature of our
> processor forces the bundling logic to live with the scheduling logic.
> We cannot build bundles without the scoreboard.
> 
> I also tried to build bundles as a preRA pass in order to reduce the
> register pressure (so the RA will take full advantage of the vliw
> architecture). I've ran into some problems such as the re-
> materialization one that we have discussed some time ago
> (http://llvm.1065342.n5.nabble.com/Instruction-bundles-before-RA-
> Rematerialization-td45900.html)
> and the liveness re-computation while moving MI's into packets, where
> we have contributed with a patch. Other problems are related to our
> specific BE implementation which doesn't allow us to get good
> performances with the preRA bundler. The preRA bundling forces a
> starting point from which the postRA bundler must start with which may
> or may not be the optimal point. Without bundles before RA, the
> register pressure will be higher but the postRA bundler will get the
> freedom it needs to build better bundles. It seems to have a trade-off
> between reg pressure and bundling capabilities. We have chosen the
> latter.
> 
> @Sergei: It's good to see that you are working on it also :-). ATM, we
> don't do any transformation which may affect bundles. The postRA
> scheduler will schedule and packetize all MI's then we run the
> packetFinalization pass. Bundle decomposition is somewhat complex in
> non-interlocked processors where the DFA is not enough to rebuild them.
> 
> @Pekka: We don't care about anti-deps as far as the dependent MI's can
> fit into the same bundle. There are anti-dep breakers in llvm and the
> requested one runs together with the postRA scheduler.
> 
> The changes I'd like to propose are mainly based in:
> - Adapting the current resource priority queue to work with MI's.
> - And either add a new postRA SchedBundler or modify the existent one.
> 
> But I think I should wait until Sergei send upstream his MI based
> scheduler. Sergei, are you working on some resource priority queue at
> MI level?
> 
> Ivan
> 
> On 06/08/2012 19:12, Andrew Trick wrote:
> > On Jul 31, 2012, at 8:37 AM, Ivan Llopard <ivanllopard at gmail.com>
> wrote:
> >
> >> Hi,
> >>
> >> I'm working on a custom top-down post RA scheduler which builds
> >> bundles at the same time for our VLIW processor. I've borrowed most
> >> of the implementation from the resource priority queue implemented
> >> for the existent VLIW scheduler but applied to the context of MI
> scheduling.
> >> Basically, instructions that are likely to be bundled must be
> >> scheduled first (i.e. get higher priority).
> >> This work should integrate very well with the current infrastructure
> >> and I'd like to contribute with a patch to add bundling capabilities
> >> to the current post RA scheduler if the LLVM community is interested
> >> :-) (May Hexagon need it as well?). It would also be a great
> >> opportunity for us to get feedback from the community about this.
> >>
> >> We have a non-interlocked processor which relies on the post ra
> >> scheduler to emit cycle-accurate bundles (valid bundles without
> >> incurring in structural hazards). The construction of bundles
> outside
> >> the scope of post RA scheduling will require structural hazard
> >> information to work properly for processors without pipeline
> interlocks.
> >> For example, we can discover that an instruction can fit into the
> >> current packet (following a schema of linear code scanning, just
> like
> >> the current DFAPacketizer does) while in fact it cannot because of
> >> structural hazards. The two terms are strongly connected and
> >> necessary to build valid packets.
> >> The concerns are mainly based on our non-interlocked processor,
> where
> >> cycle-accurate bundle emission is necessary. Other approaches/ideas
> >> are very welcome.
> >> Do you have any plan for adding a more robust bundler into the
> >> current infrastructure ?
> >>
> >> Ivan
> > Hi Ivan,
> >
> > Your description sounds fine to me. I assume you are totally
> decoupled from what LLVM currently calls the "postRA" scheduling pass.
> Hopefully you don't need anything in PostRASchedulerList.cpp.
> >
> > Running your bundler as a preEmit pass is the cleanest approach. But
> if need be, we can support preRA bundling at the time the
> MachineScheduler currently runs (if enabled). TargetPassConfig allows
> you to substitute your own pass in place of MachineScheduler. Passes
> that run after MachineScheduler are intended to support instruction
> bundles. This feature is not extensively tested, so anyone taking this
> approach would need to work with backend maintainers to get things
> fixed.
> >
> > -Andy