[LLVMdev] Simpler subreg ops in machine code IR

Tue Jun 15 14:48:24 PDT 2010

I am considering adding a new target independent codegen-only COPY instruction to our MachineInstr representation. It would be used to replace INSERT_SUBREG, EXTRACT_SUBREG, and virtual register copies after instruction selection. Selection DAG still needs {INSERT,EXTRACT}_SUBREG, but they would not appear as MachineInstrs any longer.

The COPY instruction handles subreg operations with less redundancy:

	%reg1045<def> = EXTRACT_SUBREG %reg1044<kill>, 4
	%reg1045<def> = COPY %reg1044:sub_32bit<kill>

	%reg1045<def> = INSERT_SUBREG %reg1045, %reg1044<kill>, 4
	%reg1045:sub_32bit<def> = COPY %reg1044<kill>

	%reg1050:ssub_0<def> = EXTRACT_SUBREG %reg1060:dsub_1<kill>, ssub_0
	%reg1050:ssub_0<def> = COPY %reg1060:ssub_2<kill>

It will also replace the TargetInstrInfo::copyRegToReg hook when copying virtual registers:

	%reg1050 = COPY %reg1044<kill>

It will be lowered with a TII.copyRegToReg() call in LowerSubregsInstructionPass (which may need renaming).

Why?

1. The new function CoalescerPair::isMoveInstr() can correctly determine if a MachineInstr is a (partial) register copy with source and destination registers and subreg indices. I think that is the only place it is done correctly currently. Weird stuff like subreg indices on EXTRACT_SUBREG operands is pretty hard to figure out. The COPY instruction is simpler.

2. Many more copies are created than are eventually output - the coalescer removes most of the copies inserted by phi elimination and 2-addr pass. The TII.copyRegToReg() hook is relatively expensive because it has to do register class comparisons to pick the right copy instruction. Similarly the TII.isMoveInstr() hook is more expensive than just checking for a COPY instruction. By calling copyRegToReg() late, and by avoiding isMoveInstr() entirely, the overhead is avoided.

3. The register class arguments to TII.copyRegToReg() can be eliminated - it will only ever be called for physical registers. This means that the implementation can be simpler, and sometimes better code can be generated. A register may be allocated to a more conveniently than the register class specifies. It also means that most of the annoying getMinimalPhysRegClass() calls go away.

4. The COPY instruction does not impose register class constraints on its operands, native copies do. This is important when we implement live interval splitting in the register allocator. After splitting an interval the register class can be recomputed. Without the copy constraints a larger register class may be valid, and spilling might be avoided.

5. copyRegToReg() can insert multiple instructions, avoiding fun pseudo-instructions like VMOVQQ and VMOVQQQQ.

Why Not?

1. copyRegToReg() won't be able to use register classes to pick a copy opcode. For instance, an XMM register will no longer be copied by MOVSS or MOVSD. Given just the physical register, MOVAPS will be used. Is that a problem?

2. What else?