[llvm-commits] Patch for X86 to use subregs

Sun Jul 29 22:20:40 PDT 2007

On Jul 29, 2007, at 9:37 PM, Christopher Lamb wrote:

>
> On Jul 29, 2007, at 6:20 PM, Evan Cheng wrote:
>
>> Sent from my iPhone
>>
>> On Jul 28, 2007, at 4:36 PM, Christopher Lamb  
>> <christopher.lamb at gmail.com> wrote:
>>
>>>
>>> On Jul 28, 2007, at 2:26 PM, Evan Cheng wrote:
>>>
>>>> On Jul 28, 2007, at 11:52 AM, Christopher Lamb  
>>>> <christopher.lamb at gmail.com> wrote:
>>>>
>>>>>
>>>>> On Jul 28, 2007, at 1:48 AM, Evan Cheng wrote:
>>>>>
>>>>>> Very cool! I need to read it more carefully.
>>>>>
>>>>>> But I see you are lowering zext to a single insert_subreg. Is  
>>>>>> that right? It won't zero out the top part, no?
>>>>>
>>>>> It's only lowering (zext i32 to i64) to an insert_subreg on  
>>>>> x86-64 where all writes to 32-bit registers implicitly zero- 
>>>>> extend into the upper 32-bits.
>>>>>
>>>>
>>>> I know. But thy mismatch semantically. A insert_subreg to the  
>>>> lower part should not change the upper half. I think this is  
>>>> only legal for anyext.
>>>
>>> On x86-64 the semantics of a 2 operand i32 insert_subreg is that  
>>> the input super-value is implicitly zero. So in this sense the  
>>> insert isn't changing the upper half, it's just that the upper  
>>> half is being set to zero implicitly rather than explicitly. If  
>>> you'll notice the insert_subreg is a two operand (implicit super  
>>> value) not a three operand version. If the insert were the three  
>>> operand version, and the super value as coming from an implicit  
>>> def I'd agree with you, but it's not.
>>
>> Ok, let's step back for a second. There are a couple of issues  
>> that should be addressed. Plz help me understand. :)
>>
>> 1: Semantics of insert_subreg should be the same across all  
>> targets, right?
>
> I'm not certain that this should be so. x86-64 clearly has a target  
> specific semantics of a 32-bit into 64-bit insert.

No, that won't do. insert_subreg and extract_subreg are by definition  
target independent. They must have the same semantics. You are  
forcing x86-64 32-bit zero-extending move to fit insert_subreg when  
they are really not the same thing.

>> 2: two operant variant of insert_subreg should mean the superreg  
>> is undef. If you insert a value into a low part, the rest of the  
>> superreg is still undef.
>
> I think the meaning of insert_subreg instruction (both 2 and 3  
> operand versions) must have semantics specific to the target. For  
> example, on x86-64 there is no valid 3 operand insert_subreg for a  
> 32-bit value into 64-bits, because the 32-bit result is always  
> going to be zero extended and overwrite the upper 32-bits.

It just means there is no way to implement a insert_subreg with a  
single instruction under x86-64. But that is perfectly ok. Apart from  
anyext, x86-64 just isn't going to benefit from it. It's also  
impossible to read or modify the higher 32-bits.

>> 3: why is there a two operant variant in the first place? Why not  
>> use undef for the superreg operant?
>
> To note, the two operand variant is of the MachineInstr. The DAG  
> form would be to represent the superregister as coming from an  
> undef node, but this gets isel'd to the two operand MachineInstr of  
> insert_subreg.
>
> The reason is that undef is typically selected to an implicit def  
> of a register. This causes an unnecessary move to be generated  
> later on. This move can be optimized away later with more  
> difficulty during subreg lowering by checking whether the input  
> register is defined by an implicit def pseudo instruction, but  
> instead I decided to perform the optimization during ISel on the  
> DAG form during instruction selection.
>
> With what you're suggesting
> reg1024 = ...
> reg1026 = insert_subreg undef, reg1024, 1
> reg1027 = insert_subreg reg1026, reg1025, 1
> use reg1027
>
> would be isel'd to then subreg lowered to:
>
> R6 = ...
> implicit def R01 <= this implicit def is unecessary

That's a pseudo instruction, it doesn't cost anything.

> R23 = R01 <= this copy is unnecessary

It can be coalesced to:
R23 = undef

> R2 = R6
> R45 = R23
> R5 = R6
> use R45

Using undef explicit is the right way to go. There is a good reason  
it's there. Having the two operand version of insert_subreg that  
implicitly use an undef value doesn't fit into the overall llvm  
philosophy.

>> 4: what's the benefit of isel a zext to insert_subreg and then  
>> xform it to a 32-bit move?
>
> The xform to a 32-bit move is only the conservative behavior. The  
> zext can be implicit if regalloc can coalesce subreg_inserts.
>
>> Why not just isel the zext to the move? It's not legal to coalesce  
>> it away anyway.
>
> Actually it is legal to coalesce it. On x86-64 any write to a 32- 
> bit register zero extends the value to 64-bits. For the  
> insert_subreg under discussion the inserted value is a 32-bit  
> result, that has in-fact already be zero extended implicitly.

It's not legal to coalesce away the 32-bit zero extending move.

Suppose RAX contains some value with top 32-bits non-zero.
mov EAX, EAX (zero extend top bits)
use RAX (expecting top bits to be zero)

Coalesced away the move is a miscompilation.

Evan

>>>
>>> Also the current behavior is to use a 32-bit mov instruction for  
>>> both zeroext and for anyext, I don't see how this is any different.
>>>
>>>>> --
>>>>> Chris
>>>>>
>>>>>> Sent from my iPhone
>>>>>>
>>>>>> On Jul 28, 2007, at 12:17 AM, Christopher Lamb  
>>>>>> <christopher.lamb at gmail.com> wrote:
>>>>>>
>>>>>>> This patch changes the X86 back end to use the new subreg  
>>>>>>> operations for appropriate truncate and extend operations.  
>>>>>>> This should allow regression testing of the subreg feature  
>>>>>>> going forward, as it's now used in a public target.
>>>>>>>
>>>>>>> The patch passed DejaGnu and all of SingleSource on my x86  
>>>>>>> machine, but there are changes for x86-64 as well which I  
>>>>>>> haven't been able to test. Output assembly for x86-64 appears  
>>>>>>> sane, but I'd appreciate someone giving the patch a try on  
>>>>>>> their x86-64 system. Other 32-bit x86 testing is also  
>>>>>>> appreciated.
>>>>>>>
>>>>>>> Thanks
>>>>>>> --
>>>>>>> Christopher Lamb
>>>>>>>
>>>>>>> <x86_subregs.patch>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> llvm-commits mailing list
>>>>>>> llvm-commits at cs.uiuc.edu
>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>> _______________________________________________
>>>>>> llvm-commits mailing list
>>>>>> llvm-commits at cs.uiuc.edu
>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>
>>>>> --
>>>>> Christopher Lamb
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> llvm-commits mailing list
>>>>> llvm-commits at cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>
>>> --
>>> Christopher Lamb
>>>
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
> --
> Christopher Lamb
>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20070729/73920381/attachment.html>