[llvm-commits] Patch for X86 to use subregs

Evan Cheng evan.cheng at apple.com
Mon Jul 30 11:06:06 PDT 2007


On Jul 30, 2007, at 12:02 AM, Christopher Lamb wrote:

>
> On Jul 29, 2007, at 10:20 PM, Evan Cheng wrote:
>
>>
>> On Jul 29, 2007, at 9:37 PM, Christopher Lamb wrote:
>>
>>>
>>> On Jul 29, 2007, at 6:20 PM, Evan Cheng wrote:
>>>
>>>> Sent from my iPhone
>>>>
>>>> On Jul 28, 2007, at 4:36 PM, Christopher Lamb  
>>>> <christopher.lamb at gmail.com> wrote:
>>>>
>>>>>
>>>>> On Jul 28, 2007, at 2:26 PM, Evan Cheng wrote:
>>>>>
>>>>>> On Jul 28, 2007, at 11:52 AM, Christopher Lamb  
>>>>>> <christopher.lamb at gmail.com> wrote:
>>>>>>
>>>>>>>
>>>>>>> On Jul 28, 2007, at 1:48 AM, Evan Cheng wrote:
>>>>>>>
>>>>>>>> Very cool! I need to read it more carefully.
>>>>>>>
>>>>>>>> But I see you are lowering zext to a single insert_subreg.  
>>>>>>>> Is that right? It won't zero out the top part, no?
>>>>>>>
>>>>>>> It's only lowering (zext i32 to i64) to an insert_subreg on  
>>>>>>> x86-64 where all writes to 32-bit registers implicitly zero- 
>>>>>>> extend into the upper 32-bits.
>>>>>>>
>>>>>>
>>>>>> I know. But thy mismatch semantically. A insert_subreg to the  
>>>>>> lower part should not change the upper half. I think this is  
>>>>>> only legal for anyext.
>>>>>
>>>>> On x86-64 the semantics of a 2 operand i32 insert_subreg is  
>>>>> that the input super-value is implicitly zero. So in this sense  
>>>>> the insert isn't changing the upper half, it's just that the  
>>>>> upper half is being set to zero implicitly rather than  
>>>>> explicitly. If you'll notice the insert_subreg is a two operand  
>>>>> (implicit super value) not a three operand version. If the  
>>>>> insert were the three operand version, and the super value as  
>>>>> coming from an implicit def I'd agree with you, but it's not.
>>>>
>>>> Ok, let's step back for a second. There are a couple of issues  
>>>> that should be addressed. Plz help me understand. :)
>>>>
>>>> 1: Semantics of insert_subreg should be the same across all  
>>>> targets, right?
>>>
>>> I'm not certain that this should be so. x86-64 clearly has a  
>>> target specific semantics of a 32-bit into 64-bit insert.
>>
>> No, that won't do. insert_subreg and extract_subreg are by  
>> definition target independent. They must have the same semantics.  
>> You are forcing x86-64 32-bit zero-extending move to fit  
>> insert_subreg when they are really not the same thing.
>
> If target independence is a requirement, then I agree that using  
> insert_subreg for x86-64 zero-ext isn't currently feasible.
>

Right.

>>>> 2: two operant variant of insert_subreg should mean the superreg  
>>>> is undef. If you insert a value into a low part, the rest of the  
>>>> superreg is still undef.
>>>
>>> I think the meaning of insert_subreg instruction (both 2 and 3  
>>> operand versions) must have semantics specific to the target. For  
>>> example, on x86-64 there is no valid 3 operand insert_subreg for  
>>> a 32-bit value into 64-bits, because the 32-bit result is always  
>>> going to be zero extended and overwrite the upper 32-bits.
>>
>> It just means there is no way to implement a insert_subreg with a  
>> single instruction under x86-64. But that is perfectly ok. Apart  
>> from anyext, x86-64 just isn't going to benefit from it. It's also  
>> impossible to read or modify the higher 32-bits.
>
> Currently the move that's generated isn't handled by coalescing  
> because the source and destination belong to different register  
> classes. The insert_subreg is meant to be a means to move values  
> implicitly between register classes that have a subreg  
> relationship. So if insert_subreg semantics must be target  
> independent, then I think you isel the zero-extending move to be:
>
> (i64 (INSERT_SUBREG (i64 0), GR32:$src, 3))

But that's wrong. Remember the superreg argument is an read / mod /  
write operand. That is, the first operand is a use, the def is the  
LHS but we are forcing the allocator to target the same physical  
register.

v1 = some existing value
v1 = insert_subreg v1, GR32:$src, 3

But zext is zeroing out the top part. i.e. zext is equal to

mov v1, 0
v1 = insert_subreg v1, GR32:$src, 3

>
> The thing is that the general coalescing will be able to determine  
> that the copy from undef is unneeded for (INSERT_SUBREG (i64  
> undef), GR32:$src, 3), but it would take a target specific hook to  
> know that the constant zero is unneeded on x86-64. A target  
> specific hook for this might be useful, but I think that this is in  
> the realm of future work now.

Sorry, I am not following. zext on x86-64, i.e. the 32-bit move,   
cannot be coalesced away. No need for target specific hook.

>>>> 3: why is there a two operant variant in the first place? Why  
>>>> not use undef for the superreg operant?
>>>
>>> To note, the two operand variant is of the MachineInstr. The DAG  
>>> form would be to represent the superregister as coming from an  
>>> undef node, but this gets isel'd to the two operand MachineInstr  
>>> of insert_subreg.
>>>
>>> The reason is that undef is typically selected to an implicit def  
>>> of a register. This causes an unnecessary move to be generated  
>>> later on. This move can be optimized away later with more  
>>> difficulty during subreg lowering by checking whether the input  
>>> register is defined by an implicit def pseudo instruction, but  
>>> instead I decided to perform the optimization during ISel on the  
>>> DAG form during instruction selection.
>>>
>>> With what you're suggesting
>>> reg1024 = ...
>>> reg1026 = insert_subreg undef, reg1024, 1
>>> reg1027 = insert_subreg reg1026, reg1025, 1
>>> use reg1027
>>>
>>> would be isel'd to then subreg lowered to:
>>>
>>> R6 = ...
>>> implicit def R01 <= this implicit def is unecessary
>>
>> That's a pseudo instruction, it doesn't cost anything.
>>
>>> R23 = R01 <= this copy is unnecessary
>>
>> It can be coalesced to:
>> R23 = undef
>>
>>> R2 = R6
>>> R45 = R23
>>> R5 = R6
>>> use R45
>>
>> Using undef explicit is the right way to go. There is a good  
>> reason it's there. Having the two operand version of insert_subreg  
>> that implicitly use an undef value doesn't fit into the overall  
>> llvm philosophy.
>
> Right now the coalescing that you are describing is happening  
> during isel. Are you simply saying that you'd rather have the  
> coalescing happen during subreg lowering? I can accept that, but  
> would you share your reasons?

There really isn't a very good argument for having the 2 different  
versions of insert_subreg. undef use must be explicitly modeled. I  
really don't see what you mean by coalescing during isel. isel  
doesn't have the concept of coalescing. Also don't forget everything  
must remain ssa until register allocation.


>>>> 4: what's the benefit of isel a zext to insert_subreg and then  
>>>> xform it to a 32-bit move?
>>>
>>> The xform to a 32-bit move is only the conservative behavior. The  
>>> zext can be implicit if regalloc can coalesce subreg_inserts.
>>>
>>>> Why not just isel the zext to the move? It's not legal to  
>>>> coalesce it away anyway.
>>>
>>> Actually it is legal to coalesce it. On x86-64 any write to a 32- 
>>> bit register zero extends the value to 64-bits. For the  
>>> insert_subreg under discussion the inserted value is a 32-bit  
>>> result, that has in-fact already be zero extended implicitly.
>>
>> It's not legal to coalesce away the 32-bit zero extending move.
>>
>> Suppose RAX contains some value with top 32-bits non-zero.
>> mov EAX, EAX (zero extend top bits)
>> use RAX (expecting top bits to be zero)
>>
>> Coalesced away the move is a miscompilation.
>
> Indeed, but what you have described is not a valid insert_subreg  
> either. Insert_subreg would take EAX as its input operand and would  
> only be coalesced into an instruction that defines EAX explicitly  
> (i.e. an instruction that defines RAX defines EAX implicitly, not  
> explicitly so no coalescing). I think that this coalescing rule is  
> generally required for correctness when coalescing insert_subreg  
> under any architecture.

What I've been saying all along. zero_extend on x86-64 isn't the same  
as a insert_sub, don't try to model it that way.

Evan

>
>
>>>>>
>>>>> Also the current behavior is to use a 32-bit mov instruction  
>>>>> for both zeroext and for anyext, I don't see how this is any  
>>>>> different.
>>>>>
>>>>>>>> On Jul 28, 2007, at 12:17 AM, Christopher Lamb  
>>>>>>>> <christopher.lamb at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> This patch changes the X86 back end to use the new subreg  
>>>>>>>>> operations for appropriate truncate and extend operations.  
>>>>>>>>> This should allow regression testing of the subreg feature  
>>>>>>>>> going forward, as it's now used in a public target.
>>>>>>>>>
>>>>>>>>> The patch passed DejaGnu and all of SingleSource on my x86  
>>>>>>>>> machine, but there are changes for x86-64 as well which I  
>>>>>>>>> haven't been able to test. Output assembly for x86-64  
>>>>>>>>> appears sane, but I'd appreciate someone giving the patch a  
>>>>>>>>> try on their x86-64 system. Other 32-bit x86 testing is  
>>>>>>>>> also appreciated.
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> --
>>>>>>>>> Christopher Lamb
>>>>>>>>>
>>>>>>>>> <x86_subregs.patch>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> llvm-commits mailing list
>>>>>>>>> llvm-commits at cs.uiuc.edu
>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>>>> _______________________________________________
>>>>>>>> llvm-commits mailing list
>>>>>>>> llvm-commits at cs.uiuc.edu
>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>>>
>>>>>>> --
>>>>>>> Christopher Lamb
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> llvm-commits mailing list
>>>>>>> llvm-commits at cs.uiuc.edu
>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>> _______________________________________________
>>>>>> llvm-commits mailing list
>>>>>> llvm-commits at cs.uiuc.edu
>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>
>>>>> --
>>>>> Christopher Lamb
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> llvm-commits mailing list
>>>>> llvm-commits at cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>
>>> --
>>> Christopher Lamb
>>>
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
> --
> Christopher Lamb
>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20070730/e10de602/attachment.html>


More information about the llvm-commits mailing list