[LLVMdev] VLIW Scheduling

Wed Sep 14 10:09:30 PDT 2005

On Wed, 14 Sep 2005, Tzu-Chien Chiu wrote:
> VLIW (Very Long Instruction Word) is a long instruction format (called
> "group" hereafter) contains several instructions. These instructions
> are not dependent on each other and could be issued in a single cycle.
>
> At this moment there is no correspondent class for VLIW. MachineInstr
> object can only represent one instruction. Usually the number of
> instructions in a group is fixed. The grouping could be implicitly
> maintained by inserting dummy nops to the instruction sequence. For
> example, there are at most two  instructions in a VLIW, if items in
> MachineBasicBlock::Insts is:
>
>  MUL
>  ADD
>  ADD
>  NOP
>  MUL
>  NOP
>
> It implies the scheduling:
>
>  MUL - ADD
>  ADD - NOP
>  MUL - NOP
>
> Each pairs could be issued in parallel.

Yup, this is currently the approach we're taking with IA64, for example. 
It has bundles which are similar in spirit to what a VLIW has (though not 
identical).

> However, "one single MachineInstr" is the "unit" of many passes, e.g.
> LiveVariables and LiveIntervals. Without modifications, some code-gen
> result may be inefficient (or incorrect).
>
> For example:
>
>  mul %reg1025, %reg1024, 1
>  add %reg1026, %reg1024, 2
>
> Suppose the life interval of %reg1024 ends at ADD (no use after ADD).
> These two instructions are not dependent and could be scheduled
> together:
>
>  mul %reg1024, %reg1025, 1 -  add %reg1026, %reg1025, 2
>
> The life intervals of  vi%reg1024 and %reg1025 are actually _not_
> interfered with each other,
>
>  %reg1024 => r1
>  %reg1025 => r1
>  %reg1026 => r2
>
>  mul  r1, r1, 1 -  add r2, r1, 2

I see.

> Note the physical register r1 could be re-used. r1 is the source
> register of both instructions, its valued is fetched and used as the
> inputs to MUL and ADD, before the MUL updates r1. It's not an
> anti-dependency.

Right.

> But without modification, the live interval analysis and the register
> allocator will consider %reg1024 and %reg1025, and allocate different
> physical registers to each of them. What's the minimal changes I have
> to make to "cheat" the LiveIntervals and LiveVariables?
> What else I may have to change for VLIW scheduling?

I think the best way to handle this is to make "macro instructions".  The 
current pipeline we have is this:

1. Instruction Selection   (LLVM code to a target DAG)
2. Pre-pass Scheduling     (target DAG to machineinstrs)
3. Register allocation
4. ...

Given this setup, I think it would make the most sense for you, to have a 
target-specific phase #2.5, that takes the individual operations and makes 
the VLIW bundles you need for proper register allocation.  In the case 
above, this would turn into:

  mul_add  r1, r1, 1  ,   r2, r1, 2

The problem with this is that you'll obviously have an N^2 explosion of 
instructions.  If this isn't an option for you, you could set up a few 
classes of macro ops, so you could have something like this:

  vliw_rri_rri  "mul", r1, r1, 1   ,  "add", r2, r1, 2

where "mul" and "add" are immediates that are rendered as the appropriate 
opcodes by the asmprinter.  If you did this, you'd only need one 
instruction for each "pattern" (in this case, a VLIW packet where the 
first takes a register/register/immediate and the second takes the same).

This should expose to the RA the property that all inputs are read before 
the outputs are produced.

If you have your own custom isel/scheduler, it might make sense to have 
the scheduler build these packets for you.

-Chris

-- 
http://nondot.org/sabre/
http://llvm.org/