[LLVMdev] [RFC] Bundling support in the PostRA Scheduler

Mon Aug 13 03:07:13 PDT 2012

Hi all,

Thanks for your feed-backs :-)

@Andrew: In fact, I've reused most of the postRA list scheduler code and 
the resource priority queue. Every time it needs to move forward, either 
because of a res hazard (HazardRec) or an invalid combination of 
instructions in the current packet (DFA), it closes the current bundle 
and advances to the next cycle. The non-interlocked nature of our 
processor forces the bundling logic to live with the scheduling logic. 
We cannot build bundles without the scoreboard.

I also tried to build bundles as a preRA pass in order to reduce the 
register pressure (so the RA will take full advantage of the vliw 
architecture). I've ran into some problems such as the 
re-materialization one that we have discussed some time ago 
(http://llvm.1065342.n5.nabble.com/Instruction-bundles-before-RA-Rematerialization-td45900.html) 
and the liveness re-computation while moving MI's into packets, where we 
have contributed with a patch. Other problems are related to our 
specific BE implementation which doesn't allow us to get good 
performances with the preRA bundler. The preRA bundling forces a 
starting point from which the postRA bundler must start with which may 
or may not be the optimal point. Without bundles before RA, the register 
pressure will be higher but the postRA bundler will get the freedom it 
needs to build better bundles. It seems to have a trade-off between reg 
pressure and bundling capabilities. We have chosen the latter.

@Sergei: It's good to see that you are working on it also :-). ATM, we 
don't do any transformation which may affect bundles. The postRA 
scheduler will schedule and packetize all MI's then we run the 
packetFinalization pass. Bundle decomposition is somewhat complex in 
non-interlocked processors where the DFA is not enough to rebuild them.

@Pekka: We don't care about anti-deps as far as the dependent MI's can 
fit into the same bundle. There are anti-dep breakers in llvm and the 
requested one runs together with the postRA scheduler.

The changes I'd like to propose are mainly based in:
- Adapting the current resource priority queue to work with MI's.
- And either add a new postRA SchedBundler or modify the existent one.

But I think I should wait until Sergei send upstream his MI based 
scheduler. Sergei, are you working on some resource priority queue at MI 
level?

Ivan

On 06/08/2012 19:12, Andrew Trick wrote:
> On Jul 31, 2012, at 8:37 AM, Ivan Llopard <ivanllopard at gmail.com> wrote:
>
>> Hi,
>>
>> I'm working on a custom top-down post RA scheduler which builds bundles
>> at the same time for our VLIW processor. I've borrowed most of the
>> implementation from the resource priority queue implemented for the
>> existent VLIW scheduler but applied to the context of MI scheduling.
>> Basically, instructions that are likely to be bundled must be scheduled
>> first (i.e. get higher priority).
>> This work should integrate very well with the current infrastructure and
>> I'd like to contribute with a patch to add bundling capabilities to the
>> current post RA scheduler if the LLVM community is interested :-) (May
>> Hexagon need it as well?). It would also be a great opportunity for us
>> to get feedback from the community about this.
>>
>> We have a non-interlocked processor which relies on the post ra
>> scheduler to emit cycle-accurate bundles (valid bundles without
>> incurring in structural hazards). The construction of bundles outside
>> the scope of post RA scheduling will require structural hazard
>> information to work properly for processors without pipeline interlocks.
>> For example, we can discover that an instruction can fit into the
>> current packet (following a schema of linear code scanning, just like
>> the current DFAPacketizer does) while in fact it cannot because of
>> structural hazards. The two terms are strongly connected and necessary
>> to build valid packets.
>> The concerns are mainly based on our non-interlocked processor, where
>> cycle-accurate bundle emission is necessary. Other approaches/ideas are
>> very welcome.
>> Do you have any plan for adding a more robust bundler into the current
>> infrastructure ?
>>
>> Ivan
> Hi Ivan,
>
> Your description sounds fine to me. I assume you are totally decoupled from what LLVM currently calls the "postRA" scheduling pass. Hopefully you don't need anything in PostRASchedulerList.cpp.
>
> Running your bundler as a preEmit pass is the cleanest approach. But if need be, we can support preRA bundling at the time the MachineScheduler currently runs (if enabled). TargetPassConfig allows you to substitute your own pass in place of MachineScheduler. Passes that run after MachineScheduler are intended to support instruction bundles. This feature is not extensively tested, so anyone taking this approach would need to work with backend maintainers to get things fixed.
>
> -Andy