[llvm-commits] [llvm] r91672 - in /llvm/trunk: lib/Target/X86/X86.td lib/Target/X86/X86InstrInfo.cpp lib/Target/X86/X86InstrInfo.td lib/Target/X86/X86InstrSSE.td lib/Target/X86/X86Subtarget.cpp lib/Target/X86/X86Subtarget.h test/CodeGen/X86/break-sse-dep.ll
Chris Lattner
clattner at apple.com
Tue Dec 29 17:51:02 PST 2009
On Dec 29, 2009, at 4:36 PM, Dan Gohman wrote:
>
>
> On Dec 21, 2009, at 11:13 AM, Chris Lattner <clattner at apple.com> wrote:
>
>>
>> On Dec 21, 2009, at 11:05 AM, Evan Cheng wrote:
>>
>>>> Unless there is a reason to have this, I'd prefer to not have it clutter up the td files. I can't imagine a reasonable (non-scalarizing) implementation of SSE that wouldn't have this issue.
>>>
>>> Really? It's a big surprise to Dan and I (and the engineer who noticed this) that unfolding the load actually breaks the register dependency. It's not documented anywhere in the public Intel manual.
>>
>> It was also surprising to me, but it makes perfect sense in retrospect. If a scalar load zeros the top of the register, it "obviously" has no dependence on the top bits coming in.
>
> No. The underlying problem is more subtle than that. The hardware phenomenon, and Evan's fix, only applies to *unary* operators, such as cvtss2sd, and not binary operators like addsd, mulss, etc. It really is surprising, and it really does make sense to have a subtarget flag here.
>
>>
>>>> For example, you didn't add this flag to any of the AMD chips.
>>>
>>> That's intentional. I don't have a AMD machine to try it on and I did not want to introduce a regression.
>>
>> I'm almost certain they have the same issue.
>
> It would be very suprising, actually. This should wait for someone with an AMD proccessor who can actually test this.
David already verified that AMD chips have exactly the same problem.
-Chris
More information about the llvm-commits
mailing list