[LLVMdev] Register based vector insert/extract
Chris Lattner
sabre at nondot.org
Mon Apr 23 14:22:27 PDT 2007
On Mon, 23 Apr 2007, Christopher Lamb wrote:
>>> The issue I'm having is that there is no extract/insert
>>> instruction in the ISA, it's simply based on using subregister
>>> operands in subsequent/preliminary instructions. At the pointer of
>>> custom lowering register allocation has not yet been done, so I
>>> don't have a way to communicate the dependency.
Ok.
>> If I have a register v4r0 with subregisters {r0, r1, r2, r3} and a
>> DAG that looks like
>>
>> load v4si <- extract_element 2 <- add -> load i32
>>
>> I'd like to be able to generate
>>
>> load v4r0
>> load r10
>> add r11, r10, r2 <== subregister 2 of v4r0
Nice ISA. That is entirely too logical. :)
We have a similar problem on X86. In particular, an integer truncate or
an extend (e.g. i16 -> i8) wants to make use of subregisters. Consider
code like this:
t1 = load i16
t2 = truncate i16 t1 to i8
t3 = add i8 t2, 42
What we would really want to generate is something like this at the
machine instr level:
r1024 = X86_LOADi16 ... ;; r1024 is i16
r1026 = ADDi8 r1024[subreg #0], 42
More specifically, we want to be able to define, for each register class,
a set of subregister classes. In the X86 world, the 64-bit register
classes could have subregclass0 = i8 parts, subregclass1 = i16 parts,
subregclass2 = i32 parts. Each <physreg, subreg#> pair should map to
another physreg (e.g. <RAX,1> -> AX).
The idea of this is that the register allocator allocates registers like
normal, but when it does the rewriting pass, when it replaces vregs with
pregs (e.g. r1024 with CX in this example), it rewrites r1024[subreg0]
with CL instead of CX. This would give us this code:
CX = X86_LOADi16 ...
DL = ADDi8 CL, 42
In your case, you'd define your vector register class with 4 subregs, one
for each piece.
Unfortunately, none of this exists yet :(. To handle truncates and
extends on X86, we currently emulate this by generating machineinstrs
like:
r1024 = X86_LOADi16 ...
r1025 = TRUNCATE_i16_to_i8 r1024
r1026 = ADDi8 r1025, 42
In the asmprinter, we print TRUNCATE_i16_to_i8 as a commented out noop if
the register allocator happens to allocate 1024 and 1025 to the same
register. If not, it uses an asmprinter hack to print this as a copy
instruction. This is horrible, and doesn't produce good code. OTOH,
before Evan improved this, we always copied into AX and out of AL for each
i16->i8 truncate, which was much worse :)
> I see that Evan has added getSubRegisters()/getSuperRegisters() to
> MRegisterInfo. This is what's needed in order to implement the
> register allocation constraint, but there's no way yet to pass the
> constraint through the operands from the DAG. There would need to be
> some way to specify that the SDOperand is referencing a subvalue of
> the produced value (perhaps a subclass of SDOperand?). This would
> allow the register allocator to try to use the sub/super register
> sets to perform the instert/extract.
Right. Evan is currently focusing on getting the late stages of the code
generator (e.g. livevars) to be able to understand arbitrary machine
instrs in the face of physreg subregs. This lays the groundwork for
handling vreg subregs, but won't solve it directly.
> Is any of this kind of work planned? The addition of those
> MRegisterInfo functions has me curious...
This is on our mid-term plan, which means we'll probably tackle it over
the next year or so, but we don't have any concrete plans in the immediate
future. If you are interested, this should be a pretty reasonable project
that will give you a chance to become more familiar with various pieces of
the early code generator. :)
-Chris
--
http://nondot.org/sabre/
http://llvm.org/
More information about the llvm-dev
mailing list