[LLVMdev] RFC: Machine Instruction Bundle

Mon Dec 5 13:50:03 PST 2011

On Dec 2, 2011, at 12:40 PM, Evan Cheng wrote:

> There have been quite a bit of discussions about adding machine instruction bundle to support VLIW targets. I have been pondering what the right representation should be and what kind of impact it might have on the LLVM code generator. I believe I have a fairly good plan now and would like to share with the LLVM community.

Let me add some information about how the register handles this extension.

First of all, I agree it is important that this can be used for more than VLIW targets. Besides the uses you mention, I would like to use bundle DBG_VALUE instructions so we can avoid silly code like this:

  MachineBasicBlock::iterator InsertPos = mi;
  while (InsertPos != MBB->begin() && llvm::prior(InsertPos)->isDebugValue())
    --InsertPos;
  MachineBasicBlock::iterator From = KillMI;
  MachineBasicBlock::iterator To = llvm::next(From);
  while (llvm::prior(From)->isDebugValue())
    --From;

I also think bundles can be used for implementing parallel copies.

Value Semantics
===============

The register allocator and probably most of the target-independent code generator won't care if instructions in a bundle are executed sequentially or in parallel.  We do care about value semantics, though. That is, which value is being read by an instruction operand in a bundle.

We definitely want to support the parallel execution semantics where all instructions in a bundle read values defined outside the bundle.  This is a swap implemented as a parallel copy bundle:

{
  R2 = R3;
  R3 = R2;
}

However, even VLIW targets like Hexagon can read values defined inside the same bundle:

{
  P0 = cmp.eq(R2,#4)
  if (!P0) R5 = #5
  if (P0.new) R3 = memw(R4)
}

This Hexagon bundle reads both the P0 predicate register defined inside the bundle (P0.new) and the value defined outside the bundle (!P0). We need to support this.

I propose that we add a new MachineOperand flag, IsInternalRead, to represent this.  The flag will mean "This operand is reading a value defined within the same instruction/bundle, not a value from outside the instruction/bundle."  The register allocator will treat the <internal> flag almost exactly like it currently treats <undef>, but there is a big difference to the target-specific code.

The register allocator and other target-independent passes don't care if there are multiple defs of the same register inside a bundle. Values are only tracked at the bundle granularity. The semantics of multiple defs is target-defined.

To summarize, all instructions in a bundle read values defined outside the bundle, unless explicitly marked as bundle-internal reads.  Multiple defs inside a bundle are indistinguishable except to the target.

Rewriting
=========

The register allocator super-pass needs to rewrite register operands.  Virtual-to-virtual rewriting happens during coalescing and live range splitting.  Virtual-to-physical rewriting happens only once at the end.

When rewriting virtual registers, a minimal understanding of value semantics is required. In particular, it is possible to split a live range right down the middle of an instruction:

  %vr0 = add %vr0, 1

May be rewritten as:

  %vr2 = add %vr1, 1

This is assuming the add doesn't have two-address constraints, of course.

When rewriting bundle operands, the <internal> flag will be sufficient to determine the correct virtual register. For example:

{
  %vr0 = cmp.eq(R2,#4)
  if (!%vr0) R5 = #5
  if (%vr0<internal>) R3 = memw(R4)
}

could be rewritten after live range splitting as:

{
  %vr2 = cmp.eq(R2,#4)
  if (!%vr1) R5 = #5
  if (%vr2<internal>) R3 = memw(R4)
}

Constraining spill code insertion
=================================

It is important to note that bundling instructions doesn't constrain the register allocation problem.

For example, this bundle would be impossible with sequential value constraints:

{
  call foo
  %vr0  = addps %vr1, %vr2
  call bar
}

The calls clobber the xmm registers, so it is impossible to register allocate this code without breaking up the bundle and inserting spill code between the calls.

With our definition of bundle value semantics, the addps is reading %vr1 and %vr2 outside the bundle, and the call clobbers are not considered relevant.  The fact that the call clobbers all registers before addps is the target's problem.

This is very similar to how inline assembly is treated.

TL;DR
=====

By adding a new <internal> flag to MachineOperand, the register allocator can effectively treat a bundle as a single instruction. All MachineOperands inside a bundle are treated as if they all belong to the single instruction. This even works when rewriting operands.

/jakob