[llvm-dev] [X86] FMA transformation restrictions
David A. Greene via llvm-dev
llvm-dev at lists.llvm.org
Mon Sep 12 18:24:54 PDT 2016
Vyacheslav Klochkov <vyacheslav.n.klochkov at gmail.com> writes:
> Probably, this moment was not mentioned explicitly for FMA intrinsics
> https://software.intel.com/en-us/node/582845That is rather a documentation problem (actually, my fault,
> as I did not add a special notice when created/added those new _
> mm_fmadd_ss/sd() intrinsics).
I didn't reference that document but I think I just missed the
passthrough from a in the description in another document.
> The intention was to maintain the existing assumption regarding the
> 1st intrinsic operand as usually and let users (including some math
> library guys) the tool that would have defined input/output behavior.
> It is important to mention that the FMA form selection (132/213/231)
> by compiler
> does not change the precision of the result. Is is always correct for
> vector opcodes and
> conditionally correct for *_Int opcodes.
> *_Int opcodes may need some additional correctness analysis.
> Commuting 2nd and 3rd operands is always correct, while commuting 1st
> and 2nd or 1st and 3rd
> requires use-def analysis.
> It is Ok to commute the 1st operand if it is known that the upper bits
> of the intrinsic result are not used.
> For example:
> __m128 res = _mm_fmadd_ss(a, b, c);
> _mm_store_ss(ptr, res); // this is the ONLY user of 'res'.
Yes of courrse.
> I did not see such use-def analysis in LLVM, but surely such exist in
> some other compilers.
> Perhaps such analysis would be implemented in LLVM eventually/soon.
It would be nice. :)
Thanks for your help!
More information about the llvm-dev