[llvm-commits] [llvm] r91672 - in /llvm/trunk: lib/Target/X86/X86.td lib/Target/X86/X86InstrInfo.cpp lib/Target/X86/X86InstrInfo.td lib/Target/X86/X86InstrSSE.td lib/Target/X86/X86Subtarget.cpp lib/Target/X86/X86Subtarget.h test/CodeGen/X86/break-sse-dep.ll

Tue Dec 29 17:51:02 PST 2009

On Dec 29, 2009, at 4:36 PM, Dan Gohman wrote:

> 
> 
> On Dec 21, 2009, at 11:13 AM, Chris Lattner <clattner at apple.com> wrote:
> 
>> 
>> On Dec 21, 2009, at 11:05 AM, Evan Cheng wrote:
>> 
>>>> Unless there is a reason to have this, I'd prefer to not have it clutter up the td files.  I can't imagine a reasonable (non-scalarizing) implementation of SSE that wouldn't have this issue.
>>> 
>>> Really? It's a big surprise to Dan and I (and the engineer who noticed this) that unfolding the load actually breaks the register dependency. It's not documented anywhere in the public Intel manual.
>> 
>> It was also surprising to me, but it makes perfect sense in retrospect.  If a scalar load zeros the top of the register, it "obviously" has no dependence on the top bits coming in.
> 
> No.  The underlying problem is more subtle than that.  The hardware phenomenon, and Evan's fix, only applies to *unary* operators, such as cvtss2sd, and not binary operators like addsd, mulss, etc.  It really is surprising, and it really does make sense to have a subtarget flag here.
> 
>> 
>>>> For example, you didn't add this flag to any of the AMD chips.
>>> 
>>> That's intentional. I don't have a AMD machine to try it on and I did not want to introduce a regression.
>> 
>> I'm almost certain they have the same issue.
> 
> It would be very suprising, actually.  This should wait for someone with an AMD proccessor who can actually test this.

David already verified that AMD chips have exactly the same problem.

-Chris