[clang] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)

Wed Oct 23 11:45:05 PDT 2024

uweigand wrote:

> My understanding is that in GCC's `__gnu_h2f_ieee`/`__gnu_f2h_ieee` is always `i32`<->`i16` (integer ABI), then `__extendhfsf2`/`__truncsfhf2` uses either `int16_t` or `_Float16` on a per-target basis as controlled by `__LIBGCC_HAS_HF_MODE__` (I don't know where this gets set). In LLVM compiler-rt, `COMPILER_RT_HAS_FLOAT16` is the control to do the same thing but it affects `extend`/`trunc` as well as `h2f`/`f2h`. I think the discrepancy works out here because if a target has `_Float16`, it will never be calling `__gnu_h2f_ieee` `__gnu_f2h_ieee`.

>From what I can see in the libgcc sources, `__gnu_h2f_ieee`/`__gnu_f2h_ieee` is indeed always `i32`<->`i16`, but it is only present on 32-bit ARM, no other platforms.   On AArch64, GCC will always use inline instructions to perform the conversion.  On 32-bit and 64-bit Intel, the compiler will use inline instructions if AVX512-FP16 is available; if not, but SSE2 is available, the compiler will use `__extendhfsf2`/`__truncsfhf2` with a `HFmode` argument (this corresponds to `_Float16`, i.e. it is passed in SSE2 registers, not like an integer); if not even SSE2 is available, using the type will result in an error.

I never see `__extendhfsf2`/`__truncsfhf2` being used with `int16_t`, even in principle, on any platform in libgcc.  There is indeed a setting `__LIBGCC_HAS_HF_MODE__` (controlled indirectly by the GCC target back-end's `TARGET_LIBGCC_FLOATING_POINT_MODE_SUPPORTED_P` setting), but the only thing that appears to be controlled by this flag is whether routines for complex multiplication and division (`__mulhc3` / `__divhc3`) are being built.   Am I missing something here?

 > From your first two sentences it sounds like `f16` is getting passed in a FP register but going FP->GPR->__gnu_h2f_ieee->FP->some_math_op->FP->__gnu_f2h_ieee->GPR->FP? I think it makes sense to either always pass `f16` as `i16` and avoid the FP registers, or make `_Float16` available so `COMPILER_RT_HAS_FLOAT16` can be used.
> 
> @uweigand mentioned figuring out an ABI for `_Float16`, is this possible? That seems like the best option.

Yes, we're working on that.  What we're planning to do is to have `_Float16` be passed and returned in the same way as `float` and `double`, i.e. using (part of) certain floating-point registers.  These registers are available on every SystemZ architecture level, so we would not have to guard their use (like Intel does with the SSE2 registers).

> A quick check seems to show that GCC 13 does not support `_Float16` on s390x, nor does the crossbuild `libgcc.a` provide `__gnu_h2f_ieee`, `__gnu_f2h_ieee`, `__extendhfsf2`, or `__truncsfhf2`. So I think LLVM will be the one to set the precedent here.

Yes, we'd have to add those.  I don't think we want `__gnu_h2f_ieee` or `__gnu_f2h_ieee` as those are ARM-only.  We'd be defining and using `__extendhfsf2` and `__truncsfhf2`, which would be defined with `_Float16` arguments passed in floating-point registers.  Either way, we should define the same set of routines (with the same ABI) in libgcc and compiler-rt.

> Note that there are some common issues with these conversions, would probably be good to test against them if possible #97981 #97975.

Thanks for pointing this out!

https://github.com/llvm/llvm-project/pull/109164