[PATCH] [X86] When pattern-matching scalar FMA3 intrinsics, don't re-arrange the first and second operands

Michael Kuperstein michael.m.kuperstein at intel.com
Thu May 21 05:38:51 PDT 2015


Hi delena, lhames, craig.topper,

The semantics of the scalar FMA intrinsics are that the high vector elements are copied from the first source, e.g. (from the Intel manual):

__m128 _mm_fmadd_ss (__m128 a, __m128 b, __m128 c)
Operation:
dst[31:0] := (a[31:0] * b[31:0]) + c[31:0]
dst[127:32] := a[127:32]
dst[MAX:128] := 0

The current pattern switches src1 and src2 around (I guess to match the "213" order), which ends up tying the original src2 to the dest.
Since the actual scalar fma3 instructions copy the high elements from the dest register, the wrong values are copied.

This modifies the pattern to leave src1 and src2 in their original order.

http://reviews.llvm.org/D9908

Files:
  lib/Target/X86/X86InstrFMA.td
  test/CodeGen/X86/fma3-intrinsics.ll

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D9908.26222.patch
Type: text/x-patch
Size: 5219 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150521/68f1a039/attachment.bin>


More information about the llvm-commits mailing list