[clang] [compiler-rt] [llvm] [SystemZ] Add support for half (fp16) (PR #109164)

Ulrich Weigand via llvm-commits llvm-commits at lists.llvm.org
Fri Apr 4 12:22:54 PDT 2025


uweigand wrote:

> Moved the comment "Promoting the result to i64...so use the default expansion" that was present in SystemZISelLowering.cpp into this method, but don't quite understand it fully. Is this talking about promoting to signed i64?

This is about whether we can (and should) implement a fp->u32 conversion via fp->s64->u32.   If the input is in the valid range for an u32, this will always result in the correct output.  However, the question is what happens if the input is *outside* that range.  The z196 instruction follows the IEEE rules and will generate an invalid operation exception, and I think we intended to match that semantics with the expanded code for z10.  The fp->s64->u32 expansion would not generate an invalid operation exception if the input value is inside the s64 but outside the u32 range, that's why we chose not to use it.

However, now that I'm looking at it again, this decision seems questionable for at least two reasons:
- the LLVM IR explicitly marks that case as undefined: "The `fptoui` instruction converts its floating-point operand into the nearest (rounding towards zero) unsigned integer value. If the value cannot fit in ty2, the result is a poison value."
- even the more complex expansion still doesn't always signal invalid operation (it does for values > UINT_MAX, but not for (all) negative values)

Maybe we should re-think this, but then again z10 doesn't really matter anymore at this point, so it probably doesn't make much sense to change codegen now ...

> FP_TO_INT:
> 
> Z10 unsigned: Expanding fp-> i32/i64 seems to work best to do without first doing the custom extension to f32 in the f16 input case. TargetLowering::expandFP_TO_UINT() can then look at the types and convert to fp_to_sint and avoid the longer sequence used for float.

Right, that makes sense (there is again the question of out-of-range inputs, but given the above discussion, I think this fine).

> For i128 which is not a legal type so this needs to be handled similarly in LowerOperationWrapper() instead.

I see that the LowerOperationWrapper still emits i128 operations.  (E.g. you expand a f16->i128 into a f16->f32 and f32->i128)   But that routine is called because of the illegal input type i128, so I understand it must not leave any operations with the illegal type; rather, it should complete the expansion fully.   This is annoying since for legal i128 we do the expansion in LowerOperation while for illegal i128 we do it in LowerOperationWrapper.   This was the same problem for the atomics, where I avoided code duplication by implementing the expansion in LowerOperationWrapper and then simply calling that routine from LowerOperation.

> Unfortunately it is not possible to return and select "Expand or LibCall", so for the i128 case it seems to be necessary to emit the libcall using the makeLibCall() method.

I see, this is unfortunate, but seems the only option.

> INT_TO_FP:
> 
> Z10 unsigned: Inlined the Promotion of i32 via f32. i64 first gets done on f32, and then expanded.

Why can't you still do the promotion via the Promote action for i32?   Then you wouldn't have to duplicate the common-code case ...
 
>     * FCOPYSIGN: test for non-existing library function removed, but handling remains for
>       the intrinsic that is used.

Ah, this implementation is quite inefficient.  There should be no extend/trunc libcalls needed at all: the sign bit in f16 is in the same place as f32 or f64, so the actual CPSDR instruction should simply work on f16 too.



https://github.com/llvm/llvm-project/pull/109164


More information about the llvm-commits mailing list