[llvm-commits] [llvm] r99345 - in /llvm/trunk/lib/Target/X86: CMakeLists.txt SSEDomainFix.cpp X86.h X86InstrInfo.cpp X86InstrInfo.h X86TargetMachine.cpp X86TargetMachine.h

Wed Mar 24 09:54:14 PDT 2010

On Mar 24, 2010, at 9:27 AM, Chris Lattner wrote:

> 
> On Mar 24, 2010, at 9:00 AM, Jakob Stoklund Olesen wrote:
> 
>> 
>> On Mar 24, 2010, at 8:26 AM, Anton Korobeynikov wrote:
>>>> I need something a bit more fancy for SSE since there are more polymorphic instructions than just a single move, and often it is necessary to twiddle the defining instruction to make a late user happy.
>>> I have crazy idea - maybe it will be possible to integrate such pass
>>> into scheduler somehow? However, this means that we'll need to do
>>> isel+scheduling at once, or something like this.
>> 
>> That is an interesting idea, it is essentially a scheduling problem. The isel during scheduling is not so bad - it is only a matter of switching between instructions with identical inputs and outputs.
>> 
>> But right now we are not scheduling for latency on X86, and we may not want to do that ever for out-of-order machines.
>> 
>> We would also have to make sure that later stages of codegen don't mess things up. CopyRegToReg would need to get more clever. TwoAddressInstrPass could also do bad stuff.
>> 
>> On Blackfin, the separate execution domains are explicit with disjoint register classes (D and P). That is not handled at all currently, and it needs to be taken care of before register allocation.
>> 
>> On the other hand, a late pass is really easy, and I can do a bit of inference across basic blocks too.
> 
> Both of these seem like instruction selections problems to me.  The problem is that after isel, we have MVT's on instruction sdnodes instead of register classes.  This problem seems exactly the same as selecting fpstack vs sse instructions for scalar floating point.

Yep.

If it is handled at isel time we could also handle different instruction patterns in different domains. Right now, I can't even replace shufps with pshufd because one is two-address and one is three-address.

I think that cross-block inference is necessary to get good results. How are the plans for full-function isel? :-)

Note that the cost of crossing domains varies a lot.

On SSE/Nehalem it is a 2-cycle latency-only penalty on an out-of-order CPU.
On ARM it is a 20-cycle penalty on an in-order CPU.
On Blackfin a move instruction (or spill/restore) is required.

/jakob