[LLVMdev] RFC: Machine Instruction Bundle
Anshuman Dasgupta
adasgupt at codeaurora.org
Mon Dec 5 07:17:25 PST 2011
Hi Evan,
Thanks for sending out the proposal. This pass will, of course, be very
important for the Hexagon backend. I have to study the proposal in more
detail but overall, it looks good to me.
> I would also like to see a target independent interface for
pre-scheduling optimizations that form instruction sequences (e.g.
macro-fusion).
My team will be happy to help with the tasks and specifically with the
packet-formation pass. We already have a pass that forms packets for
Hexagon and we can generalize the pass and make it machine-independent.
-Anshu
--
Qualcomm Innovation Center, Inc is a member of Code Aurora Forum
On 12/2/2011 2:40 PM, Evan Cheng wrote:
> *Machine Instruction Bundle in LLVM*
>
> Hi all,
>
> There have been quite a bit of discussions about adding machine
> instruction bundle to support VLIW targets. I have been pondering what
> the right representation should be and what kind of impact it might
> have on the LLVM code generator. I believe I have a fairly good plan
> now and would like to share with the LLVM community.
>
> *Design Criteria*
>
> 1. The bundle representation must be light weight. We cannot afford to
> add significant memory or compile time overhead.
> 2. It must be flexible enough to represent more than VLIW bundles. It
> should be useful to represent arbitrary sequence of instructions that
> must be scheduled as a unit. e.g. ARM Thumb2 IT block, Intel compare +
> branch macro-fusion, or random instruction sequences that are
> currently modeled as pseudo instructions that are expanded late.
> 3. Minimize the amount of changes required in the LLVM code generator,
> especially in target independent passes. It must minimize code
> duplication (i.e. we don't want code snippets that search for bundle
> start / end like all the code in the backend that skip over DBG_VALUE).
> 4. The representation should make it easy for new code to be oblivious
> of bundles. That is, MI passes should not have to check whether
> something is a bundle.
>
> Given the above, we can rule out a new class (e.g. MachineInstrBundle)
> right away. We don't want MachineBasic block to keep a list of
> MachineInstrBundles since it will require massive amount of code
> change. So what are the choices?
>
> *Bundle Representation*
>
> 1. A nested MachineInstr: This is the most natural (meaning it looks
> most like the real HW bundle) representation. It has the nice property
> that most passes do not have to check if a MI is a bundle.The concern
> here this can add significant memory overhead if this means adding a
> ilist or SmallVector field to keep bundled MIs.
> 2. Add a bit to MachineInstr: The bit means the next MI in the list is
> part of the same bundle. This is very light weight. However it
> requires many passes to check wether a MI is part of a bundle.
>
> The solution is a combination of both #1 and #2. Conceptually we want
> a representation that looks like this:
>
> --------------
> | Bundle | -------
> -------------- \
> | ----------------
> | | MI |
> | ----------------
> | |
> | ----------------
> | | MI |
> | ----------------
> | |
> | ----------------
> | | MI |
> | ----------------
> |
> --------------
> | Bundle | ------
> -------------- \
> | ----------------
> | | MI |
> | ----------------
> | |
> | ----------------
> | | MI |
> | ----------------
> | |
> | ...
> |
> --------------
> | Bundle | ------
> -------------- \
> |
> ...
>
>
> This is #1, a series of nested MI's. However, we are going to store
> the instructions in the same way as it's done right now, i.e. a
> list<MachineInstr> on MachineBasicBlocks. Using #2, we will add a bit
> to MI that indicates whether it is part of a bundle.
>
> ----------------
> | MI * | (* bit indicates next MI is
> "glued" to this MI, i.e. in the same bundle)
> ----------------
> |
> ----------------
> | MI * |
> ----------------
> |
> ----------------
> | MI | (no bit, this is the end of
> the bundle)
> --------------
> |
> ----------------
> | MI * | (* a new bundle)
> ----------------
> |
> ----------------
> | MI |
> ----------------
> |
> ...
>
> We are going to hide the complexity in the MachineBasicBlock::iterator
> instead. That is, the iterator will be changed to visit only the *top
> level* instructions (i.e. first instruction in each bundle). We will
> add another iterator that allows client to visit all of the MIs for
> those passes that want to look into bundles.
>
> We can use the same representation for arbitrary sequence of
> instructions that cannot be broken up. e.g. Thumb2 IT blocks.
>
> ----------------
> | MI | (just a MI)
> ----------------
> |
> ----------------
> | MI * | (* Start of Thumb2 IT block)
> ----------------
> |
> ----------------
> | MI * |
> ----------------
> |
> ----------------
> | MI | (last MI in the block)
> ----------------
> |
> ----------------
> | MI |
> ----------------
> |
> ...
>
> This representation can support VLIW (where top level MI's are all
> start of bundles) or non-VLIW (where there can be mix of MIs and
> bundles). It is also very cheap since the "Flags" field has plenty of
> free bits available.
>
> *Properties of Bundle*
>
> If MI passes can consider each bundle as a single unit, then how are
> they going to examine properties (i.e. flags and operands) of a MI
> bundle? Conceptually a the properties of a bundle is the union of the
> properties of all the MIs inside the bundle. So a bundle reads all the
> inputs that the individual MIs read and it defines all the outputs of
> the individual MIs. However, this is not correct when there are
> intra-bundle dependencies. e.g.
>
> -------------------------
> | r0 = op1 r1, r2 |
> | r3 = op2 r0<kill>, #c |
> -------------------------
>
> r0 should not be considered as a source on the bundle since it's
> defined inside the bundle and its live range does not extend beyond
> it. Instead, r0 is a clobber (i.e. dead def).
>
> -------------------------
> | r0 = op1 r1, r2 |
> | r3 = op2 r0, #c |
> -------------------------
> ...
> = op3 r0,
>
> r0 is a def, not a use.
>
> What does this mean? It means in order for passes to operate on a
> bundle at a time, it must be able to visit all the defs and uses of a
> bundle. We have established that computing the defs and uses of a
> bundle is not as trivial as taking the union. This is certainly not
> something we want to re-compute every time! This requires a slight
> change to the bundle representation.
>
> ----------------
> | Bundle * | (A MI with special opcode
> "Bundle")
> ----------------
> |
> ----------------
> | MI * |
> ----------------
> |
> ----------------
> | MI * |
> ----------------
> |
> ----------------
> | MI | (no bit, this is the end of
> the bundle)
> --------------
> |
> ----------------
> | Bundle * | (a new bundle)
> ----------------
> |
> ----------------
> | MI * |
> ----------------
> |
> ----------------
> | MI |
> ----------------
> |
> ...
>
> The pseudo bundle instructions should be used to capture properties of
> the bundle. When a bundle is finalized the packetizer must add source
> and def operands to the pseudo bundle instruction. More on this later.
>
> Other properties, such as mayLoad, mayStore, are static properties
> associated with opcodes. They cannot be copied. We will add APIs to
> examine properties of MIs which will do the *right thing* for bundles
> (i.e. look into MIs in bundles).
>
> *Packetizing*
>
> The current MI flow looks like this:
>
> 1. DAG to MI lowering (and pre-RA schedule)
> 2. MI optimizations (LICM, CSE, etc.)
> 3. Register allocation super pass
> 3a. De-ssa (2-address, phi slim)
> 3b. Coalescing
> 3c. Actual register allocation
> 4. Post-RA optimizations
> 5. PEI
> 6. Post-RA scheduling
>
> In the hopefully not very distant future it should look like this:
>
> 1. DAG to MI lowering (no scheduling!)
> 2. MI optimizations (LICM, CSE, etc.)
> 3. Register allocation super pass
> 3a. De-ssa (2-address, phi slim)
> 3b. Coalescing
> 3c. *Pre-RA scheduling*
> 3d. Actual register allocation
> 4. Post-RA optimizations
> 5. PEI
> 6. *Re-schedule restores, copies*
>
> The current proposal is for "packetization" to be done as part of the
> "RA super pass". Early MI optimization passes such as LICM do not
> benefit from operating on bundles. Furthermore, the packetizer should
> not have to know how to deal with copies which may later be coalesced,
> phi nodes, or other copy like pseudo instructions.
>
> Packetization should be done in two phases. The first part decides
> what MIs should be bundled together and it add the "bits" which glued
> MIs together. This can be done either before pre-RA scheduling. The
> second part of the packetization should only be done after register
> allocation is completed. There are two very important reason for this.
>
> 1. Packet finalization *must* add source and def operands to the
> "Bundle" pseudo MI. This allows all later passes to handle they
> transparently. However, we do not want to do this before register
> allocation is complete. Otherwise it introduces new defs and uses of
> virtual registers and that mess up MachineRegisterInfo def-use chains.
>
> e.g. Now vr0 has two defs!
> defs: vr0<dead>, vr3, uses: vr1, vr2
> ----------------------------
> | vr0 = op1 vr1, vr2 |
> | vr3 = op2 vr0<kill>, #c |
> ----------------------------
>
> 2. During register allocation, more identity copies will be eliminated
> while loads, stores, copies, re-materialized instructions will be
> introduced. It makes sense for the second part of packetization to try
> to fill these new instructions into empty slots (for VLIW like targets).
>
> So the overall flow should look like this:
>
> 1. DAG to MI lowering (no scheduling!)
> 2. MI optimizations (LICM, CSE, etc.)
> 3. Register allocation super pass
> 3a. De-ssa (2-address, phi slim)
> 3b. Coalescing
> 3c. *Pre-scheduling packetization (optional)*
> 3d. Pre-RA scheduling (or *integrated packetization*)
> 3e. *Post-scheduling packetization (optional)*
> 3f. Actual register allocation
> 3g. *Packet finalization*
> 4. Post-RA optimizations
> 5. PEI
> 6. Re-schedule restores, copies
>
> *Lowering Bundles to MCInst*
>
> There is no need to add the equivalent of MI bundle to MCInst. A MI
> bundle should be concatenated into a single MCInst by storing opcodes
> as integer operands. e.g.
>
> -------------------------
> | r0 = op1 r1, r2 |
> | r3 = op2 r0, #c |
> -------------------------
>
> =>
>
> MCInst: op1 r0, r1, r2, op2, r3, r0, #c
> or
> MCInst: op1 op2 r0, r1, r2, r3, r0, #c
>
> *What's Next?*
>
> I am hoping to find some time to implement the followings in the near
> future:
> 1. Add BUNDLE opcode
> 2. MachineInstr class changes: new bit, changes to methods such as
> eraseFromParent(), isIdenticalTo().
> 3. Change MachineInstr::iterator to skip over bundled MIs. Rename old
> iterator.
> 4. Add MachineInstr API to check for instruction properties and switch
> existing code over.
> 5. Add API to form a bundle. It would compute the proper def's and
> use's and add MachineOperands to the bundle MI.
> 6. Switch Thumb2 IT block to using MI bundles.
> 7. Add interface for targets to register their own packetization passes.
>
> I would dearly welcome help on any of these tasks especially on 4, 5,
> 6. I also would not cry if someone beats me to #6 (or actually any of
> the tasks. :-)
>
> In the longer term, I would like to see a target independent
> packetization pass (I believe one is being reviewed). I would also
> like to see a target independent interface for pre-scheduling
> optimizations that form instruction sequences (e.g. macro-fusion).
> Patches welcome!
>
> Evan
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111205/e4b298fc/attachment.html>
More information about the llvm-dev
mailing list