[LLVMdev] X86 sub_ss and sub_sd sub-register indexes

Thu Jul 26 10:04:34 PDT 2012

On Jul 26, 2012, at 9:43 AM, dag at cray.com wrote:

> Jakob Stoklund Olesen <jolesen at apple.com> writes:
> 
>> As far as I can tell, all sub-register operations involving sub_ss and
>> sub_sd can simply be replaced with COPY_TO_REGCLASS:
>> 
>>  def : Pat<(v4i32 (X86Movsd VR128:$src1, VR128:$src2)),
>>            (VMOVSDrr VR128:$src1, (EXTRACT_SUBREG (v4i32 VR128:$src2),
>>                                                   sub_sd))>;
>> 
>> Becomes:
>> 
>>  def : Pat<(v4i32 (X86Movsd VR128:$src1, VR128:$src2)),
>>            (VMOVSDrr VR128:$src1, (COPY_TO_REGCLASS VR128:$src2, FR64))>;
> 
> A few questions:
> 
> Will COPY_TO_REGCLASS actually generate a copy instruction or can
> TableGen/isel fold it away?

Both EXTRACT_SUBREG and COPY_TO_REGCLASS are emitted as COPY instructions by InstrEmitter. One as a sub-register copy, one as a full register copy. Both are handled by the register coalescer.

It would actually be possible to have EmitCopyToRegClassNode() try to call MRI->constrainRegClass() first, just like AddRegisterOperand() does. That could avoid the copy in some cases, and you would simply get a VR128 register as the second VMOVSDrr operand. I am not proposing we do that for now. Let the register coalescer deal with that.

> What happens if the result of the above pattern using COPY_TO_REGCLASS
> is spilled?  Will we get a 64-bit store or a 128-bit store?

This behavior isn't affected by the change. FR64 registers are spilled with 64-bit stores, and VR128 registers are spilled with 128-bit stores.

When the register coalescer removes a copy between VR128 and FR64 registers, it chooses the larger spill size for the result. This is the same for sub-register copies and full register copies.

The important point here is that VR128 is a sub-class of FR64, so getCommonSubClass(VR128, FR64) -> VR128. This is the Liskov substitution principle for register classes.

/jakob