[llvm-commits] [llvm] r91672 - in /llvm/trunk: lib/Target/X86/X86.td lib/Target/X86/X86InstrInfo.cpp lib/Target/X86/X86InstrInfo.td lib/Target/X86/X86InstrSSE.td lib/Target/X86/X86Subtarget.cpp lib/Target/X86/X86Subtarget.h test/CodeGen/X86/break-sse-dep.ll

Tue Dec 29 16:36:17 PST 2009

On Dec 21, 2009, at 11:13 AM, Chris Lattner <clattner at apple.com> wrote:

>
> On Dec 21, 2009, at 11:05 AM, Evan Cheng wrote:
>
>>> Unless there is a reason to have this, I'd prefer to not have it  
>>> clutter up the td files.  I can't imagine a reasonable (non- 
>>> scalarizing) implementation of SSE that wouldn't have this issue.
>>
>> Really? It's a big surprise to Dan and I (and the engineer who  
>> noticed this) that unfolding the load actually breaks the register  
>> dependency. It's not documented anywhere in the public Intel manual.
>
> It was also surprising to me, but it makes perfect sense in  
> retrospect.  If a scalar load zeros the top of the register, it  
> "obviously" has no dependence on the top bits coming in.

No.  The underlying problem is more subtle than that.  The hardware  
phenomenon, and Evan's fix, only applies to *unary* operators, such as  
cvtss2sd, and not binary operators like addsd, mulss, etc.  It really  
is surprising, and it really does make sense to have a subtarget flag  
here.

>
>>> For example, you didn't add this flag to any of the AMD chips.
>>
>> That's intentional. I don't have a AMD machine to try it on and I  
>> did not want to introduce a regression.
>
> I'm almost certain they have the same issue.

It would be very suprising, actually.  This should wait for someone  
with an AMD proccessor who can actually test this.

>
>>
>> I don't really care that much whether it's a subtarget feature. If  
>> no one pipes up soon, I'll remove it.

Please leave the flag in place.

Dan

>