[LLVMdev] RFC: Machine Instruction Bundle

Mon Dec 5 07:17:25 PST 2011

Hi Evan,

Thanks for sending out the proposal. This pass will, of course, be very 
important for the Hexagon backend. I have to study the proposal in more 
detail but overall, it looks good to me.

 > I would also like to see a target independent interface for 
pre-scheduling optimizations that form instruction sequences (e.g. 
macro-fusion).

My team will be happy to help with the tasks and specifically with the 
packet-formation pass. We already have a pass that forms packets for 
Hexagon and we can generalize the pass and make it machine-independent.

-Anshu

--
Qualcomm Innovation Center, Inc is a member of Code Aurora Forum

On 12/2/2011 2:40 PM, Evan Cheng wrote:
> *Machine Instruction Bundle in LLVM*
>
> Hi all,
>
> There have been quite a bit of discussions about adding machine 
> instruction bundle to support VLIW targets. I have been pondering what 
> the right representation should be and what kind of impact it might 
> have on the LLVM code generator. I believe I have a fairly good plan 
> now and would like to share with the LLVM community.
>
> *Design Criteria*
>
> 1. The bundle representation must be light weight. We cannot afford to 
> add significant memory or compile time overhead.
> 2. It must be flexible enough to represent more than VLIW bundles. It 
> should be useful to represent arbitrary sequence of instructions that 
> must be scheduled as a unit. e.g. ARM Thumb2 IT block, Intel compare + 
> branch macro-fusion, or random instruction sequences that are 
> currently modeled as pseudo instructions that are expanded late.
> 3. Minimize the amount of changes required in the LLVM code generator, 
> especially in target independent passes. It must minimize code 
> duplication (i.e. we don't want code snippets that search for bundle 
> start / end like all the code in the backend that skip over DBG_VALUE).
> 4. The representation should make it easy for new code to be oblivious 
> of bundles. That is, MI passes should not have to check whether 
> something is a bundle.
>
> Given the above, we can rule out a new class (e.g. MachineInstrBundle) 
> right away. We don't want MachineBasic block to keep a list of 
> MachineInstrBundles since it will require massive amount of code 
> change. So what are the choices?
>
> *Bundle Representation*
>
> 1. A nested MachineInstr: This is the most natural (meaning it looks 
> most like the real HW bundle) representation. It has the nice property 
> that most passes do not have to check if a MI is a bundle.The concern 
> here this can add significant memory overhead if this means adding a 
> ilist or SmallVector field to keep bundled MIs.
> 2. Add a bit to MachineInstr: The bit means the next MI in the list is 
> part of the same bundle. This is very light weight. However it 
> requires many passes to check wether a MI is part of a bundle.
>
> The solution is a combination of both #1 and #2. Conceptually we want 
> a representation that looks like this:
>
> --------------
> |  Bundle    | -------
> --------------        \
>          |           ----------------
>          |           |      MI      |
>          |           ----------------
>          |                   |
>          |           ----------------
>          |           |      MI      |
>          |           ----------------
>          |                   |
>          |           ----------------
>          |           |      MI      |
>          |           ----------------
>          |
> --------------
> |  Bundle    | ------
> --------------       \
>          |           ----------------
>          |           |      MI      |
>          |           ----------------
>          |                   |
>          |           ----------------
>          |           |      MI      |
>          |           ----------------
>          |                   |
>          |                  ...
>          |
> --------------
> |  Bundle    | ------
> --------------       \
>          |
>         ...
>
>
> This is #1, a series of nested MI's. However, we are going to store 
> the instructions in the same way as it's done right now, i.e. a 
> list<MachineInstr> on MachineBasicBlocks.  Using #2, we will add a bit 
> to MI that indicates whether it is part of a bundle.
>
>                      ----------------
>                      |      MI    * |   (* bit indicates next MI is 
> "glued" to this MI, i.e. in the same bundle)
>                      ----------------
>                              |
>                      ----------------
>                      |      MI    * |
>                      ----------------
>                              |
>                      ----------------
>                      |      MI      |    (no bit, this is the end of 
> the bundle)
>                      --------------
>                              |
>                      ----------------
>                      |      MI    * |   (* a new bundle)
>                      ----------------
>                              |
>                      ----------------
>                      |      MI      |
>                      ----------------
>                              |
>                             ...
>
> We are going to hide the complexity in the MachineBasicBlock::iterator 
> instead. That is, the iterator will be changed to visit only the *top 
> level* instructions (i.e. first instruction in each bundle). We will 
> add another iterator that allows client to visit all of the MIs for 
> those passes that want to look into bundles.
>
> We can use the same representation for arbitrary sequence of 
> instructions that cannot be broken up. e.g. Thumb2 IT blocks.
>
>                      ----------------
>                      |      MI      |   (just a MI)
>                      ----------------
>                             |
>                      ----------------
>                      |      MI    * |   (* Start of Thumb2 IT block)
>                      ----------------
>                             |
>                      ----------------
>                      |      MI    * |
>                      ----------------
>                             |
>                      ----------------
>                      |      MI      |   (last MI in the block)
>                      ----------------
>                             |
>                      ----------------
>                      |      MI      |
>                      ----------------
>                             |
>                            ...
>
> This representation can support VLIW (where top level MI's are all 
> start of bundles) or non-VLIW (where there can be mix of MIs and 
> bundles). It is also very cheap since the "Flags" field has plenty of 
> free bits available.
>
> *Properties of Bundle*
>
> If MI passes can consider each bundle as a single unit, then how are 
> they going to examine properties (i.e. flags and operands) of a MI 
> bundle? Conceptually a the properties of a bundle is the union of the 
> properties of all the MIs inside the bundle. So a bundle reads all the 
> inputs that the individual MIs read and it defines all the outputs of 
> the individual MIs. However, this is not correct when there are 
> intra-bundle dependencies. e.g.
>
> -------------------------
> | r0 = op1 r1, r2       |
> | r3 = op2 r0<kill>, #c |
> -------------------------
>
> r0 should not be considered as a source on the bundle since it's 
> defined inside the bundle and its live range does not extend beyond 
> it. Instead, r0 is a clobber (i.e. dead def).
>
> -------------------------
> | r0 = op1 r1, r2       |
> | r3 = op2 r0, #c       |
> -------------------------
>  ...
>      = op3 r0,
>
> r0 is a def, not a use.
>
> What does this mean? It means in order for passes to operate on a 
> bundle at a time, it must be able to visit all the defs and uses of a 
> bundle. We have established that computing the defs and uses of a 
> bundle is not as trivial as taking the union. This is certainly not 
> something we want to re-compute every time! This requires a slight 
> change to the bundle representation.
>
>                      ----------------
>                      |    Bundle  * |   (A MI with special opcode 
> "Bundle")
>                      ----------------
>                              |
>                      ----------------
>                      |      MI    * |
>                      ----------------
>                              |
>                      ----------------
>                      |      MI    * |
>                      ----------------
>                              |
>                      ----------------
>                      |      MI      |    (no bit, this is the end of 
> the bundle)
>                      --------------
>                              |
>                      ----------------
>                      |    Bundle  * |   (a new bundle)
>                      ----------------
>                              |
>                      ----------------
>                      |      MI    * |
>                      ----------------
>                              |
>                      ----------------
>                      |      MI      |
>                      ----------------
>                              |
>                             ...
>
> The pseudo bundle instructions should be used to capture properties of 
> the bundle. When a bundle is finalized the packetizer must add source 
> and def operands to the pseudo bundle instruction. More on this later.
>
> Other properties, such as mayLoad, mayStore, are static properties 
> associated with opcodes. They cannot be copied. We will add APIs to 
> examine properties of MIs which will do the *right thing* for bundles 
> (i.e. look into MIs in bundles).
>
> *Packetizing*
>
> The current MI flow looks like this:
>
> 1. DAG to MI lowering (and pre-RA schedule)
> 2. MI optimizations (LICM, CSE, etc.)
> 3. Register allocation super pass
>    3a. De-ssa (2-address, phi slim)
>    3b. Coalescing
>    3c. Actual register allocation
> 4. Post-RA optimizations
> 5. PEI
> 6. Post-RA scheduling
>
> In the hopefully not very distant future it should look like this:
>
> 1. DAG to MI lowering (no scheduling!)
> 2. MI optimizations (LICM, CSE, etc.)
> 3. Register allocation super pass
>    3a. De-ssa (2-address, phi slim)
>    3b. Coalescing
>    3c. *Pre-RA scheduling*
>    3d. Actual register allocation
> 4. Post-RA optimizations
> 5. PEI
> 6. *Re-schedule restores, copies*
>
> The current proposal is for "packetization" to be done as part of the 
> "RA super pass". Early MI optimization passes such as LICM do not 
> benefit from operating on bundles. Furthermore, the packetizer should 
> not have to know how to deal with copies which may later be coalesced, 
> phi nodes, or other copy like pseudo instructions.
>
> Packetization should be done in two phases. The first part decides 
> what MIs should be bundled together and it add the "bits" which glued 
> MIs together. This can be done either before pre-RA scheduling. The 
> second part of the packetization should only be done after register 
> allocation is completed. There are two very important reason for this.
>
> 1. Packet finalization *must* add source and def operands to the 
> "Bundle" pseudo MI. This allows all later passes to handle they 
> transparently. However, we do not want to do this before register 
> allocation is complete. Otherwise it introduces new defs and uses of 
> virtual registers and that mess up MachineRegisterInfo def-use chains.
>
> e.g. Now vr0 has two defs!
> defs: vr0<dead>, vr3, uses: vr1, vr2
> ----------------------------
> | vr0 = op1 vr1, vr2       |
> | vr3 = op2 vr0<kill>, #c  |
> ----------------------------
>
> 2. During register allocation, more identity copies will be eliminated 
> while loads, stores, copies, re-materialized instructions will be 
> introduced. It makes sense for the second part of packetization to try 
> to fill these new instructions into empty slots (for VLIW like targets).
>
> So the overall flow should look like this:
>
> 1. DAG to MI lowering (no scheduling!)
> 2. MI optimizations (LICM, CSE, etc.)
> 3. Register allocation super pass
>    3a. De-ssa (2-address, phi slim)
>    3b. Coalescing
>    3c. *Pre-scheduling packetization (optional)*
>    3d. Pre-RA scheduling (or *integrated packetization*)
>    3e. *Post-scheduling packetization (optional)*
>    3f. Actual register allocation
>    3g. *Packet finalization*
> 4. Post-RA optimizations
> 5. PEI
> 6. Re-schedule restores, copies
>
> *Lowering Bundles to MCInst*
>
> There is no need to add the equivalent of MI bundle to MCInst. A MI 
> bundle should be concatenated into a single MCInst by storing opcodes 
> as integer operands. e.g.
>
> -------------------------
> | r0 = op1 r1, r2       |
> | r3 = op2 r0, #c       |
> -------------------------
>
> =>
>
> MCInst: op1 r0, r1, r2, op2, r3, r0, #c
> or
> MCInst: op1 op2 r0, r1, r2, r3, r0, #c
>
> *What's Next?*
>
> I am hoping to find some time to implement the followings in the near 
> future:
> 1. Add BUNDLE opcode
> 2. MachineInstr class changes: new bit, changes to methods such as 
> eraseFromParent(), isIdenticalTo().
> 3. Change MachineInstr::iterator to skip over bundled MIs. Rename old 
> iterator.
> 4. Add MachineInstr API to check for instruction properties and switch 
> existing code over.
> 5. Add API to form a bundle. It would compute the proper def's and 
> use's and add MachineOperands to the bundle MI.
> 6. Switch Thumb2 IT block to using MI bundles.
> 7. Add interface for targets to register their own packetization passes.
>
> I would dearly welcome help on any of these tasks especially on 4, 5, 
> 6. I also would not cry if someone beats me to #6 (or actually any of 
> the tasks. :-)
>
> In the longer term, I would like to see a target independent 
> packetization pass (I believe one is being reviewed). I would also 
> like to see a target independent interface for pre-scheduling 
> optimizations that form instruction sequences (e.g. macro-fusion). 
> Patches welcome!
>
> Evan
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111205/e4b298fc/attachment.html>