[llvm-dev] [X86] FMA transformation restrictions

Mon Sep 12 14:41:19 PDT 2016

Hi David,

The commute of 1st<->2nd and 1st<->3rd operands is _usually_ prohibited
for scalar FMA *_Int opcodes because it would change the values passed
through the first operand of intrinsic.

I would challenge your statement:
  "user cannot rely on knowing which operand is tied to the destination".
It is the common practice for all intrinsics with *_ss() and *_sd()
suffixes that the first operand of the intrinsic is tied to the destination.
For example:
    // https://software.intel.com/sites/default/files/a6/22/18072-347603.pdf
    __m128 _mm_add_ss(__m128 a, __m128 b)
    Adds the lower single-precision, floating-point (SP FP) values of a and
b;
the upper 3 SP FP values are passed through from a.

Probably, this moment was not mentioned explicitly for FMA intrinsics here:
https://software.intel.com/en-us/node/582845
That is rather a documentation problem (actually, my fault,
as I did not add a special notice when created/added those new
_mm_fmadd_ss/sd() intrinsics).

The intention was to maintain the existing assumption regarding the 1st
intrinsic operand as usually and let users (including some math library
guys) the tool that would have defined input/output behavior.

It is important to mention that the FMA form selection (132/213/231) by
compiler
does not change the precision of the result. Is is always correct for
vector opcodes and
conditionally correct for *_Int opcodes.

*_Int opcodes may need some additional correctness analysis.
Commuting 2nd and 3rd operands is always correct, while commuting 1st and
2nd or 1st and 3rd
requires use-def analysis.
It is Ok to commute the 1st operand if it is known that the upper bits
of the intrinsic result are not used.
For example:
  __m128 res = _mm_fmadd_ss(a, b, c);
  _mm_store_ss(ptr, res); // this is the ONLY user of 'res'.

I did not see such use-def analysis in LLVM, but surely such exist in some
other compilers.
Perhaps such analysis would be implemented in LLVM eventually/soon.

Thank you,
Vyacheslav Klochkov

On Mon, Sep 12, 2016 at 10:24 AM, <dag at cray.com> wrote:

> I noticed that the operand commuting code in X86InstrInfo.cpp treats
> scalar FMA intrinsics specially.  It prevents operand commuting on these
> scalar instructions because the scalar FMA instructions preserve the
> upper bits of the vector.  Presumably, the restrictions are there
> because commuting operands potentially changes the result upper bits.
>
> However, AFAIK the Intel and GNU FMA intrinsics don't actually specify
> which FMA (213, 132, 231) is going to be used and so the user can't rely
> on knowing which operand is tied to the destination.  Thus the user
> can't rely on knowing what the upper bits will be.
>
> Is there some other reason these scalar FMA commuting restrictions are
> in place?
>
> Thanks!
>
>                             -David
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160912/fda5327c/attachment.html>