[PATCH] [X86] When pattern-matching scalar FMA3 intrinsics, don't re-arrange the first and second operands
Michael Kuperstein
michael.m.kuperstein at intel.com
Thu May 21 05:38:51 PDT 2015
Hi delena, lhames, craig.topper,
The semantics of the scalar FMA intrinsics are that the high vector elements are copied from the first source, e.g. (from the Intel manual):
__m128 _mm_fmadd_ss (__m128 a, __m128 b, __m128 c)
Operation:
dst[31:0] := (a[31:0] * b[31:0]) + c[31:0]
dst[127:32] := a[127:32]
dst[MAX:128] := 0
The current pattern switches src1 and src2 around (I guess to match the "213" order), which ends up tying the original src2 to the dest.
Since the actual scalar fma3 instructions copy the high elements from the dest register, the wrong values are copied.
This modifies the pattern to leave src1 and src2 in their original order.
http://reviews.llvm.org/D9908
Files:
lib/Target/X86/X86InstrFMA.td
test/CodeGen/X86/fma3-intrinsics.ll
EMAIL PREFERENCES
http://reviews.llvm.org/settings/panel/emailpreferences/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D9908.26222.patch
Type: text/x-patch
Size: 5219 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150521/68f1a039/attachment.bin>
More information about the llvm-commits
mailing list