[llvm-dev] Help making 'narrow instruct microcode' Backend

Tue Nov 24 09:40:02 PST 2015

>
>> On Nov 12, 2015, at 8:43 AM, James R. Byerly via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>>
>> I've been experimenting with llvm/clang as a user for a while now, but
>> now
>> I'm interested in writing my own backend. I'm also developing the target
>> architecture (maybe to go in an fpga eventually) and I'm intentionally
>> making it extremely simple. I think of it as a narrow microcode, because
>> (for example) performing an add requires a sequence of instructions
>> like:
>> set aluin1 = r1
>> set aluin2 = r2
>> aluop add
>> set r3 = aluout
>
> This sounds very similar to some sorts of existing VLIW architectures,
> which may look something like this:
>
> // START BUNDLE
> {
> // input phase
> s0 = <some input>
> s1 = <some input>
> [â¦]
> sN = <some input>
> // phase 0
> t0 = <some operation using s registers>
> t1 = <some operation using s registers>
> [â¦]
> tN = <some operation using s registers>
> // phase 1
> // stuff using t registers
> [â¦]
> // phase N
> // stuff using t registers
> // output phase
> rX = t0
> rY = t1
> rZ = t2
> }
> // END BUNDLE
>
> Of course, the bundles can be as complex or simple as you like; they could
> have just one phase and one operation, for example. Either way, this sort
> of thing where it defines inputs, then defines an instruction, then
> defines outputs, sounds very much like a problem for VLIW bundles. VLIW
> tends to be very simple/âclose to the metalâ insomuch as the bits in
> the instruction encoding directly map to muxes between the phases, i.e.
> selecting which register is passed from where to where.
>
> âescha

My target has a lot in common with VLIW architectures, except for the 'VL'
part. I'd like to essentually have similar low-level semantics without
bundling instructions in order to take advantage of the llvm scheduling
and register allocation capabilities (and also make the target simpler,
not having to perform any but the simplest operations).

Currently, I have it partially working by defining an ALU as a v4i32
register with 4 subregisters (in1, in2, out, flags), and then in my
SelectionDAGISel::Select, I replace a node like 'ISD::ADD' like so:
r3 = add r1, r2
into
AluReg = temporary from ALURegClass
Copy To AluReg, (IMPLICIT_DEF)
AluIn1 = ALUIN1 (Copy From AluReg), r1
AluIn2 = ALUIN2 AluIn1, r2
AluResult = ADD AluIn2
res = temporary from GPRRegClass
Copy To res (ALUOUT AluResult)
r3 = res

Where every value/register named Alu* is the same v4i32 alu register in
various stages of update. I found the copy of implicit def through an alu
seems to help llvm reason about the subreg liveness (using just implicit
def, llvm seems to think the regs live forever, and using just a temp
register, llvm complains it is used before it is defined), and the copy of
the result through a gpr helps with register pressure (presumably because
GPRs are more plentiful than alu registers).

However, I still have an issue with register allocation. If, in my
subclass of TargetLowering, I set the scheduling preference to
Sched::RegPressure, things work ok most of the time, but I get extremely
poor scheduling because every alu operation uses aluA and computes
everything in order exactly as described above. If I use any other
scheduling preference, I get much better scheduled code, but llvm moves
the ALUIN1r/ALUIN2r instruction pair far from the ADD/ALUOUTr instruction
pair and thus creates an excessive number of long live ranges. It then
tries to spill the ALU registers, which isn't possible, and everything
fails.

Any advice on getting any 'real' scheduling to work without introducing
unneeded spilling (unneeded because it IS possible to keep the 4
instructions grouped and thus never have an alu register live outside an
actual alu operation, and up to N groups can be interleaved when I have N
alu registers) would be greatly appreciated. I know I can bundle the
instructions, but that (afaik) eliminates the possibility of the
instruction scheduler interleaving multiple alu instructions, or mixing
alu instructions with non-alu instructions.