[LLVMdev] RFC: Machine Instruction Bundle

Sat Dec 3 12:18:26 PST 2011

It's important to understand this is a proposal for code generator IR change, not specific to specific passes for register allocator or scheduler. Passes that want to understand more about the internals of instruction bundles are free to design their own data structure. It is imperative that we do not add multiple constructs for various types of bundles. That would add significant overhead for the rest of the code generator.

I have discussed the proposal for current code owners of register allocator and instruction scheduler. This is a proposal that will work with existing (or module being planned for near future) infrastructure.

Evan

On Dec 3, 2011, at 7:12 AM, Pekka Jääskeläinen wrote:

> Hi,
> 
> I'm glad to see some action with regard to static instruction
> scheduling and VLIW support in LLVM. I have some questions and
> remarks which might not be relevant as I'm not totally familiar
> with the current code generation framework of LLVM nor your plan.
> 
> On 12/02/2011 10:40 PM, Evan Cheng wrote:
>> 2. It must be flexible enough to represent more than VLIW bundles. It should be
>> useful to represent arbitrary sequence of instructions that must be scheduled as
>> a unit. e.g. ARM Thumb2 IT block, Intel compare + branch macro-fusion, or random
>> instruction sequences that are currently modeled as pseudo instructions that are
>> expanded late.
> 
> The concept of a "VLIW bundle" is to mark a set of instructions that
> should/could be executed in *parallel*. A static parallel instruction
> schedule for a single instruction cycle, that is.
> 
> In other words, with a VLIW target a bundle might not be just "an atomic,
> possibly sequentially executed chunk of instructions" or "a set of
> instructions that can be executed in parallel but also sequentially".
> In some architectures, the sequential execution might break the schedule
> due to visible function unit pipeline latencies and no hardware interlocking.
> 
> Is it wise to mix the two concepts of "parallel instructions" and the looser
> "instructions that should be executed together"? The "parallel semantics"
> implies changes to how the scheduling is done (the earliest/latest cycle where
> an instruction can be scheduled) and also, e.g., the register allocation's live
> ranges (if allocating regs on a "packetized" = parallel code)?
> 
> Moreover, the definition of VLIW parallel bundle implies that there cannot be
> no "intra bundle dependencies", otherwise those instructions could not be 
> executed in parallel in the reality.
> 
> For example, looking at your example of a bundle with "intra-bundle
> dependencies":
> 
> -------------------------
> | r0 = op1 r1, r2       |
> | r3 = op2 r0<kill>, #c |
> -------------------------
> 
> In case of a static VLIW target the semantics of this instruction is that these
> two "RISC instructions are executed in parallel, period". Thus, the first
> instruction cannot depend on the latter (or the other way around) but op2 reads
> the old value of r0, not the one written in the same bundle.
> 
> It depends on the architecture's data hazard detection support, register file 
> bypasses, etc. whether the r0 update of the 1st instruction is available to
> the second instruction in the bundle or whether the new r0 value can be read
> only by the succeeding instruction bundles. If it is available, the execution
> is sequential in reality as op1 must produce the value before op2 can
> execute.
> 
> Itanium machines are an example of "parallel bundle architectures"
> (and of course also other "more traditional" VLIWs are, like the TI C64x[2]):
> 
> "EPIC allows compilers to define independent instruction sequences, which allows 
> hardware to ignore dependency checks between these instructions.  This same 
> hardware functionality in OOO RISC designs is very costly and complex."
> [1]
> 
> As an example of the "not truly parallel instruction bundles", on the other
> hand, we have played a bit with the Cell SPU which is quite static architecture
> but still has hardware data hazard detection and hardware interlocking. It
> would differentiate between your case and the one where the order is different
> because it follows the sequential instruction order in its hardware data
> dependence resolving logic and stalls the pipeline (thus does not really
> execute the instructions in parallel) if the sequential order has data hazards.
> 
> For how to actually represent the (parallel) instruction bundles I do not have
> a strong opinion, as long as these semantic difference between a "parallel
> bundle" and "just a chunk of instructions that should be executed together" are
> made clear and adhered to everywhere in the code generation.
> 
> [1] http://www.dig64.org/about/Itanium2_white_paper_public.pdf
> [2] http://www.ti.com/lit/ug/spru395b/spru395b.pdf
> 
> Best regards,
> -- 
> Pekka from the TCE project
> http://tce.cs.tut.fi
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev