[llvm-dev] [X86] FMA transformation restrictions
Vyacheslav Klochkov via llvm-dev
llvm-dev at lists.llvm.org
Mon Sep 12 14:41:19 PDT 2016
The commute of 1st<->2nd and 1st<->3rd operands is _usually_ prohibited
for scalar FMA *_Int opcodes because it would change the values passed
through the first operand of intrinsic.
I would challenge your statement:
"user cannot rely on knowing which operand is tied to the destination".
It is the common practice for all intrinsics with *_ss() and *_sd()
suffixes that the first operand of the intrinsic is tied to the destination.
__m128 _mm_add_ss(__m128 a, __m128 b)
Adds the lower single-precision, floating-point (SP FP) values of a and
the upper 3 SP FP values are passed through from a.
Probably, this moment was not mentioned explicitly for FMA intrinsics here:
That is rather a documentation problem (actually, my fault,
as I did not add a special notice when created/added those new
The intention was to maintain the existing assumption regarding the 1st
intrinsic operand as usually and let users (including some math library
guys) the tool that would have defined input/output behavior.
It is important to mention that the FMA form selection (132/213/231) by
does not change the precision of the result. Is is always correct for
vector opcodes and
conditionally correct for *_Int opcodes.
*_Int opcodes may need some additional correctness analysis.
Commuting 2nd and 3rd operands is always correct, while commuting 1st and
2nd or 1st and 3rd
requires use-def analysis.
It is Ok to commute the 1st operand if it is known that the upper bits
of the intrinsic result are not used.
__m128 res = _mm_fmadd_ss(a, b, c);
_mm_store_ss(ptr, res); // this is the ONLY user of 'res'.
I did not see such use-def analysis in LLVM, but surely such exist in some
Perhaps such analysis would be implemented in LLVM eventually/soon.
On Mon, Sep 12, 2016 at 10:24 AM, <dag at cray.com> wrote:
> I noticed that the operand commuting code in X86InstrInfo.cpp treats
> scalar FMA intrinsics specially. It prevents operand commuting on these
> scalar instructions because the scalar FMA instructions preserve the
> upper bits of the vector. Presumably, the restrictions are there
> because commuting operands potentially changes the result upper bits.
> However, AFAIK the Intel and GNU FMA intrinsics don't actually specify
> which FMA (213, 132, 231) is going to be used and so the user can't rely
> on knowing which operand is tied to the destination. Thus the user
> can't rely on knowing what the upper bits will be.
> Is there some other reason these scalar FMA commuting restrictions are
> in place?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev