[cfe-dev] [llvm-dev] _Float16 support

Wed Jan 23 13:42:47 PST 2019

The issue isn't limited to calls either. If a half is a liveout of one
basic block and used by another basic block. We'll emit an fp_round with 1
for the second argument in the receiving basic block. But the producing
basic block won't have done anything to make it true.

~Craig

On Wed, Jan 23, 2019 at 11:53 AM Richard Smith <richard at metafoo.co.uk>
wrote:

> On Wed, 23 Jan 2019, 11:27 Craig Topper via llvm-dev, <
> llvm-dev at lists.llvm.org> wrote:
>
>> While looking at the codegen Andy showed, I notice that the initial
>> SelectionDAG looks like this for x86-64.
>>
>>   t0: ch = EntryToken
>>       t2: f32,ch = CopyFromReg t0, Register:f32 %0
>>     t6: f16 = fp_round t2, TargetConstant:i64<1>
>>       t4: f32,ch = CopyFromReg t0, Register:f32 %1
>>     t7: f16 = fp_round t4, TargetConstant:i64<1>
>>   t8: f16 = fmul t6, t7
>>   t10: i64 = Constant<0>
>>     t12: ch = store<(store 2 into @x)> t0, t8, GlobalAddress:i64<half*
>> @x> 0, undef:i64
>>     t13: f32 = fp_extend t8
>>   t16: ch,glue = CopyToReg t12, Register:f32 $xmm0, t13
>>   t17: ch = X86ISD::RET_FLAG t16, TargetConstant:i32<0>, Register:f32
>> $xmm0, t16:1
>>
>> The FP_ROUNDs for the arguments each have the flag set that indicates
>> that the fp_round doesn't lose any information. This is the
>> TargetConstant:i64<1> as the second operand.
>>
>> As far as I can tell, any caller of this would have an FP_EXTEND from f16
>> to f32 in their initial selection dag for calling this function. When the
>> FP_EXTENDs are type legalized
>> by DAGTypeLegalizer::PromoteFloatOp_FP_EXTEND, the FP_EXTEND will be
>> removed completely with no replacement operations. I believe this means
>> there is no guarantee that the f32 value passed in doesn't contain
>> precision beyond the range of f16. So the fp_round nodes saying no
>> information is lost in the callee are not accurate.
>>
>
> That seems wrong to me from an ABI perspective; I would expect the burden
> to be on the caller to only pass a valid "half" value to a "half"
> parameter. But this leads back to Andy's point: we're inventing an ABI rule
> here.
>
> ~Craig
>>
>>
>> On Tue, Jan 22, 2019 at 10:38 AM Kaylor, Andrew via cfe-dev <
>> cfe-dev at lists.llvm.org> wrote:
>>
>>> I'd like to start a discussion about how clang supports _Float16 for
>>> target architectures that don't have direct support for 16-bit floating
>>> point arithmetic.
>>>
>>>
>>>
>>> The current clang language extensions documentation says, "If
>>> half-precision instructions are unavailable, values will be promoted to
>>> single-precision, similar to the semantics of __fp16 except that the
>>> results will be stored in single-precision." This is somewhat vague (to me)
>>> as to what is meant by promotion of values, and the part about results
>>> being stored in single-precision isn't what actually happens.
>>>
>>>
>>>
>>> Consider this example:
>>>
>>>
>>>
>>> _Float16 x;
>>>
>>> _Float16 f(_Float16 y, _Float16 z) {
>>>
>>>   x = y * z;
>>>
>>>   return x;
>>>
>>> }
>>>
>>>
>>>
>>> When compiling with “-march=core-avx2” that results (after some trivial
>>> cleanup) in this IR:
>>>
>>>
>>>
>>> @x = global half 0xH0000, align 2
>>>
>>> define half @f(half, half) {
>>>
>>>   %3 = fmul half %0, %1
>>>
>>>   store half %3, half* @x
>>>
>>>   ret half %3
>>>
>>> }
>>>
>>>
>>>
>>> That’s not too unreasonable I suppose, except for the fact that it
>>> hasn’t taken the lack of target support for half-precision arithmetic into
>>> account yet. That will happen in the selection DAG. The assembly code
>>> generated looks like this (with my annotations):
>>>
>>>
>>>
>>> f:                                      # @f
>>>
>>> # %bb.0:
>>>
>>>        vcvtps2ph       xmm1, xmm1, 4             # Convert argument 1
>>> from single to half
>>>
>>>         vcvtph2ps       xmm1, xmm1                # Convert argument 1
>>> back to single
>>>
>>>         vcvtps2ph       xmm0, xmm0, 4            # Convert argument 0
>>> from single to half
>>>
>>>         vcvtph2ps       xmm0, xmm0                # Convert argument 0
>>> back to single
>>>
>>>         vmulss             xmm0, xmm0, xmm1   # xmm0 = xmm0*xmm1 (single
>>> precision)
>>>
>>>         vcvtps2ph       xmm1, xmm0, 4            # Convert the single
>>> precision result to half
>>>
>>>         vmovd             eax, xmm1                      # Move the half
>>> precision result to eax
>>>
>>>         mov                 word ptr [rip + x], ax     # Store the half
>>> precision result in the global, x
>>>
>>>         ret
>>> # Return the single precision result still in xmm0
>>>
>>> .Lfunc_end0:
>>>
>>>                                         # -- End function
>>>
>>>
>>>
>>> Something odd has happened here, and it may not be obvious what it is.
>>> This code begins by converting xmm0 and xmm1 from single to half and then
>>> back to single. The first conversion is happening because the back end
>>> decided that it needed to change the types of the parameters to single
>>> precision but the function body is expecting half precision values.
>>> However, since the target can’t perform the required computation with half
>>> precision values they must be converted back to single for the
>>> multiplication. The single precision result of the multiplication is
>>> converted to half precision to be stored in the global value, x, but the
>>> result is returned as single precision (via xmm0).
>>>
>>>
>>>
>>> I’m not primarily worried about the extra conversions here. We can’t get
>>> rid of them because we can’t prove they aren’t rounding, but that’s a
>>> secondary issue. What I’m worried about is that we allowed/required the
>>> back end to improvise an ABI to satisfy the incoming IR, and the choice it
>>> made is questionable.
>>>
>>>
>>>
>>> For a point of comparison, I looked at what gcc does. Currently, gcc
>>> only allows _Float16 in C, not C++, and if you try to use it with a target
>>> that doesn’t have native support for half-precision arithmetic, it tells
>>> you “’_Float16’ is not supported on this target.” That seems preferable to
>>> making up an ABI on the fly.
>>>
>>>
>>>
>>> I haven’t looked at what happens with clang when compiling for other
>>> targets that don’t have native support for half-precision arithmetic, but I
>>> would imagine that similar problems exist.
>>>
>>>
>>>
>>> Thoughts?
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Andy
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190123/0c15c0aa/attachment.html>