[llvm] [IR] Add intrinsics to represent complex multiply and divide operations (PR #68742)

Wed Nov 1 13:22:48 PDT 2023

jcranmer-intel wrote:

> Do we care whether the results of complex multiply/divide are correctly rounded? This isn't mentioned anywhere, but the naive expansion double-rounds the result. And if we use FMA, we still double-round, but produce a different result in some cases. (If we don't care, that's fine, I guess, but we should explicitly state the expected precision somewhere.)

All implementations of `__mulsc3` I can find double-round the results. The story is quite different for `__divsc3`, of which there's at least 4 different implementations I've seen (naive version, naive version with scalbn, Smith's algorithm, and naive version in next-larger-float-size), all of which have different rounding implications. I view these largely like the libm intrinsics in terms of rounding guarantees.

For hardware implementations, x86-64's VFCMULCSH explicitly is implemented as fmul + fma, with intermediate rounding, whereas I can't quite tell from AArch64's manual whether or not FCMLA double-rounds or not.

> How does this interact with strictfp? Do we need separate strictfp intrinsics?

This does need some form of strictfp support, but I've started to align with @arsenm in that I'm not sure that constrained intrinsics is necessarily the best way to expose strictfp support.

https://github.com/llvm/llvm-project/pull/68742