[LLVMdev] X86 FMA4
Demikhovsky, Elena
elena.demikhovsky at intel.com
Sat Jul 28 23:57:49 PDT 2012
Our specialists (Intel) say that “vmovaps” and “vmovsd” have the same throughput and latency, but “vmovsd” reduces chance of 4k aliasing, so it is preferable.
- Elena
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Cameron McInally
Sent: Thursday, July 26, 2012 17:50
To: Jan Sjodin
Cc: dag at cray.com; llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] X86 FMA4
Hey Jan and Dave,
It's not obvious, but there is a significant scalar performance issue following the GCC intrinsics.
Let's look at the VFMADDSD pattern. We're operating on scalars with undefineds as the remaining vector elements of the operands. This sounds okay, but when one looks closer...
vmovsd fp4_+1088(%rip), %xmm3 # fpppp.f:647
vmovaps %xmm3, 18560(%rsp) # fpppp.f:647 <= 16-byte spill
vfmaddsd %xmm5, fp4_+3288(%rip), %xmm3, %xmm3 # fpppp.f:647
The spill here is 16-bytes. But, we're only using the low 8-bytes of xmm3. Changing the intrinsics and patterns to accept scalar operands, we end up with...
vmovsd fp4_+1056(%rip), %xmm0 # fpppp.f:666
vmovsd %xmm0, 10088(%rsp) # fpppp.f:666 <= 8-byte spill
vfmaddsd %xmm3, fp4_+3288(%rip), %xmm0, %xmm3 # fpppp.f:666
I do not know the actual number of cycles offhand, but I believe on Interlagos and Sandybridge, a vmovaps takes roughly 3x as many micro-ops as a vmovsd if it involves memory.
-Cameron
On Thu, Jul 26, 2012 at 9:41 AM, Jan Sjodin <jan_sjodin at yahoo.com<mailto:jan_sjodin at yahoo.com>> wrote:
Because the intrinsics uses vector types (same as gcc).
- Jan
----- Original Message -----
> From: "dag at cray.com<mailto:dag at cray.com>" <dag at cray.com<mailto:dag at cray.com>>
> To: llvmdev at cs.uiuc.edu<mailto:llvmdev at cs.uiuc.edu>
> Cc:
> Sent: Wednesday, July 25, 2012 3:26 PM
> Subject: [LLVMdev] X86 FMA4
>
> We're migrating to LLVM 3.1 and trying to use the upstream FMA patterns.
>
> Why is VFMADDSD4 defined with vector types? Is this simply because the
> gcc intrinsic uses vector types? It's quite unnatural if you have a
> compiler that generates FMAs as opposed to requiring user intrinsics.
---------------------------------------------------------------------
Intel Israel (74) Limited
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120729/a040afd5/attachment.html>
More information about the llvm-dev
mailing list