[clang] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)
Ulrich Weigand via llvm-commits
llvm-commits at lists.llvm.org
Wed Oct 23 11:45:05 PDT 2024
uweigand wrote:
> My understanding is that in GCC's `__gnu_h2f_ieee`/`__gnu_f2h_ieee` is always `i32`<->`i16` (integer ABI), then `__extendhfsf2`/`__truncsfhf2` uses either `int16_t` or `_Float16` on a per-target basis as controlled by `__LIBGCC_HAS_HF_MODE__` (I don't know where this gets set). In LLVM compiler-rt, `COMPILER_RT_HAS_FLOAT16` is the control to do the same thing but it affects `extend`/`trunc` as well as `h2f`/`f2h`. I think the discrepancy works out here because if a target has `_Float16`, it will never be calling `__gnu_h2f_ieee` `__gnu_f2h_ieee`.
>From what I can see in the libgcc sources, `__gnu_h2f_ieee`/`__gnu_f2h_ieee` is indeed always `i32`<->`i16`, but it is only present on 32-bit ARM, no other platforms. On AArch64, GCC will always use inline instructions to perform the conversion. On 32-bit and 64-bit Intel, the compiler will use inline instructions if AVX512-FP16 is available; if not, but SSE2 is available, the compiler will use `__extendhfsf2`/`__truncsfhf2` with a `HFmode` argument (this corresponds to `_Float16`, i.e. it is passed in SSE2 registers, not like an integer); if not even SSE2 is available, using the type will result in an error.
I never see `__extendhfsf2`/`__truncsfhf2` being used with `int16_t`, even in principle, on any platform in libgcc. There is indeed a setting `__LIBGCC_HAS_HF_MODE__` (controlled indirectly by the GCC target back-end's `TARGET_LIBGCC_FLOATING_POINT_MODE_SUPPORTED_P` setting), but the only thing that appears to be controlled by this flag is whether routines for complex multiplication and division (`__mulhc3` / `__divhc3`) are being built. Am I missing something here?
> From your first two sentences it sounds like `f16` is getting passed in a FP register but going FP->GPR->__gnu_h2f_ieee->FP->some_math_op->FP->__gnu_f2h_ieee->GPR->FP? I think it makes sense to either always pass `f16` as `i16` and avoid the FP registers, or make `_Float16` available so `COMPILER_RT_HAS_FLOAT16` can be used.
>
> @uweigand mentioned figuring out an ABI for `_Float16`, is this possible? That seems like the best option.
Yes, we're working on that. What we're planning to do is to have `_Float16` be passed and returned in the same way as `float` and `double`, i.e. using (part of) certain floating-point registers. These registers are available on every SystemZ architecture level, so we would not have to guard their use (like Intel does with the SSE2 registers).
> A quick check seems to show that GCC 13 does not support `_Float16` on s390x, nor does the crossbuild `libgcc.a` provide `__gnu_h2f_ieee`, `__gnu_f2h_ieee`, `__extendhfsf2`, or `__truncsfhf2`. So I think LLVM will be the one to set the precedent here.
Yes, we'd have to add those. I don't think we want `__gnu_h2f_ieee` or `__gnu_f2h_ieee` as those are ARM-only. We'd be defining and using `__extendhfsf2` and `__truncsfhf2`, which would be defined with `_Float16` arguments passed in floating-point registers. Either way, we should define the same set of routines (with the same ABI) in libgcc and compiler-rt.
> Note that there are some common issues with these conversions, would probably be good to test against them if possible #97981 #97975.
Thanks for pointing this out!
https://github.com/llvm/llvm-project/pull/109164
More information about the llvm-commits
mailing list