[PATCH] Implement aarch64 neon instruction class AdvSIMD (by element) - Clang
Tim Northover
t.p.northover at gmail.com
Mon Sep 30 04:37:55 PDT 2013
Hi Jiangning,
> I see your point, so do you mean we must generate fmla instruction for intrinsic function vfma_lane_f32(), no matter if it is in -ffast-math mode or not? Then I think we have to generate fmls for intrinsic function vfms_lane_f32() as well.
I believe so.
> I don't see LLVM IR has @llvm.fms.* defined, so we have to define an aarch64 specific LLVM intrinsic, or we can use an expression containing llvm.fma.* to represent it?
I think I worked out that it was equivalent to @lllvm.fma(-x, y, z)
(and @llvm.fma(x, -y, z)). The negation is exact, and the fusing works
out to be the same for "z + (-x)*y" as for "z - x*y".
By the way, be wary of the operand order. @llvm.fma(x,y,z) calculates
"x*y+z", but "fmla x, y, z" calculates x + y*z. I *think* both me and
Ana got that wrong at least once. I know I did.
Cheers.
Tim.
More information about the llvm-commits
mailing list