[llvm-commits] [llvm] r91672 - in /llvm/trunk: lib/Target/X86/X86.td lib/Target/X86/X86InstrInfo.cpp lib/Target/X86/X86InstrInfo.td lib/Target/X86/X86InstrSSE.td lib/Target/X86/X86Subtarget.cpp lib/Target/X86/X86Subtarget.h test/CodeGen/X86/break-sse-dep.ll
gohman at apple.com
Tue Dec 29 16:36:17 PST 2009
On Dec 21, 2009, at 11:13 AM, Chris Lattner <clattner at apple.com> wrote:
> On Dec 21, 2009, at 11:05 AM, Evan Cheng wrote:
>>> Unless there is a reason to have this, I'd prefer to not have it
>>> clutter up the td files. I can't imagine a reasonable (non-
>>> scalarizing) implementation of SSE that wouldn't have this issue.
>> Really? It's a big surprise to Dan and I (and the engineer who
>> noticed this) that unfolding the load actually breaks the register
>> dependency. It's not documented anywhere in the public Intel manual.
> It was also surprising to me, but it makes perfect sense in
> retrospect. If a scalar load zeros the top of the register, it
> "obviously" has no dependence on the top bits coming in.
No. The underlying problem is more subtle than that. The hardware
phenomenon, and Evan's fix, only applies to *unary* operators, such as
cvtss2sd, and not binary operators like addsd, mulss, etc. It really
is surprising, and it really does make sense to have a subtarget flag
>>> For example, you didn't add this flag to any of the AMD chips.
>> That's intentional. I don't have a AMD machine to try it on and I
>> did not want to introduce a regression.
> I'm almost certain they have the same issue.
It would be very suprising, actually. This should wait for someone
with an AMD proccessor who can actually test this.
>> I don't really care that much whether it's a subtarget feature. If
>> no one pipes up soon, I'll remove it.
Please leave the flag in place.
More information about the llvm-commits