[llvm-dev] [X86] FMA transformation restrictions

Mon Sep 12 18:24:54 PDT 2016

Vyacheslav Klochkov <vyacheslav.n.klochkov at gmail.com> writes:

> Probably, this moment was not mentioned explicitly for FMA intrinsics
> here:
> https://software.intel.com/en-us/node/582845That is rather a documentation problem (actually, my fault, 
> as I did not add a special notice when created/added those new _
> mm_fmadd_ss/sd() intrinsics).

I didn't reference that document but I think I just missed the
passthrough from a in the description in another document.

> The intention was to maintain the existing assumption regarding the
> 1st intrinsic operand as usually and let users (including some math
> library guys) the tool that would have defined input/output behavior.

Makes sense.

> It is important to mention that the FMA form selection (132/213/231)
> by compiler 
> does not change the precision of the result. Is is always correct for
> vector opcodes and
> conditionally correct for *_Int opcodes. 
>
> *_Int opcodes may need some additional correctness analysis.
> Commuting 2nd and 3rd operands is always correct, while commuting 1st
> and 2nd or 1st and 3rd
> requires use-def analysis.
> It is Ok to commute the 1st operand if it is known that the upper bits 
> of the intrinsic result are not used.
> For example:
> __m128 res = _mm_fmadd_ss(a, b, c);
> _mm_store_ss(ptr, res); // this is the ONLY user of 'res'.

Yes of courrse.

> I did not see such use-def analysis in LLVM, but surely such exist in
> some other compilers.
> Perhaps such analysis would be implemented in LLVM eventually/soon.

It would be nice.  :)

Thanks for your help!

                   -David