[cfe-dev] [llvm-dev] _Float16 support
Richard Smith via cfe-dev
cfe-dev at lists.llvm.org
Wed Jan 23 11:52:48 PST 2019
On Wed, 23 Jan 2019, 11:27 Craig Topper via llvm-dev, <
llvm-dev at lists.llvm.org> wrote:
> While looking at the codegen Andy showed, I notice that the initial
> SelectionDAG looks like this for x86-64.
>
> t0: ch = EntryToken
> t2: f32,ch = CopyFromReg t0, Register:f32 %0
> t6: f16 = fp_round t2, TargetConstant:i64<1>
> t4: f32,ch = CopyFromReg t0, Register:f32 %1
> t7: f16 = fp_round t4, TargetConstant:i64<1>
> t8: f16 = fmul t6, t7
> t10: i64 = Constant<0>
> t12: ch = store<(store 2 into @x)> t0, t8, GlobalAddress:i64<half* @x>
> 0, undef:i64
> t13: f32 = fp_extend t8
> t16: ch,glue = CopyToReg t12, Register:f32 $xmm0, t13
> t17: ch = X86ISD::RET_FLAG t16, TargetConstant:i32<0>, Register:f32
> $xmm0, t16:1
>
> The FP_ROUNDs for the arguments each have the flag set that indicates that
> the fp_round doesn't lose any information. This is the
> TargetConstant:i64<1> as the second operand.
>
> As far as I can tell, any caller of this would have an FP_EXTEND from f16
> to f32 in their initial selection dag for calling this function. When the
> FP_EXTENDs are type legalized
> by DAGTypeLegalizer::PromoteFloatOp_FP_EXTEND, the FP_EXTEND will be
> removed completely with no replacement operations. I believe this means
> there is no guarantee that the f32 value passed in doesn't contain
> precision beyond the range of f16. So the fp_round nodes saying no
> information is lost in the callee are not accurate.
>
That seems wrong to me from an ABI perspective; I would expect the burden
to be on the caller to only pass a valid "half" value to a "half"
parameter. But this leads back to Andy's point: we're inventing an ABI rule
here.
~Craig
>
>
> On Tue, Jan 22, 2019 at 10:38 AM Kaylor, Andrew via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
>> I'd like to start a discussion about how clang supports _Float16 for
>> target architectures that don't have direct support for 16-bit floating
>> point arithmetic.
>>
>>
>>
>> The current clang language extensions documentation says, "If
>> half-precision instructions are unavailable, values will be promoted to
>> single-precision, similar to the semantics of __fp16 except that the
>> results will be stored in single-precision." This is somewhat vague (to me)
>> as to what is meant by promotion of values, and the part about results
>> being stored in single-precision isn't what actually happens.
>>
>>
>>
>> Consider this example:
>>
>>
>>
>> _Float16 x;
>>
>> _Float16 f(_Float16 y, _Float16 z) {
>>
>> x = y * z;
>>
>> return x;
>>
>> }
>>
>>
>>
>> When compiling with “-march=core-avx2” that results (after some trivial
>> cleanup) in this IR:
>>
>>
>>
>> @x = global half 0xH0000, align 2
>>
>> define half @f(half, half) {
>>
>> %3 = fmul half %0, %1
>>
>> store half %3, half* @x
>>
>> ret half %3
>>
>> }
>>
>>
>>
>> That’s not too unreasonable I suppose, except for the fact that it hasn’t
>> taken the lack of target support for half-precision arithmetic into account
>> yet. That will happen in the selection DAG. The assembly code generated
>> looks like this (with my annotations):
>>
>>
>>
>> f: # @f
>>
>> # %bb.0:
>>
>> vcvtps2ph xmm1, xmm1, 4 # Convert argument 1
>> from single to half
>>
>> vcvtph2ps xmm1, xmm1 # Convert argument 1
>> back to single
>>
>> vcvtps2ph xmm0, xmm0, 4 # Convert argument 0
>> from single to half
>>
>> vcvtph2ps xmm0, xmm0 # Convert argument 0
>> back to single
>>
>> vmulss xmm0, xmm0, xmm1 # xmm0 = xmm0*xmm1 (single
>> precision)
>>
>> vcvtps2ph xmm1, xmm0, 4 # Convert the single
>> precision result to half
>>
>> vmovd eax, xmm1 # Move the half
>> precision result to eax
>>
>> mov word ptr [rip + x], ax # Store the half
>> precision result in the global, x
>>
>> ret #
>> Return the single precision result still in xmm0
>>
>> .Lfunc_end0:
>>
>> # -- End function
>>
>>
>>
>> Something odd has happened here, and it may not be obvious what it is.
>> This code begins by converting xmm0 and xmm1 from single to half and then
>> back to single. The first conversion is happening because the back end
>> decided that it needed to change the types of the parameters to single
>> precision but the function body is expecting half precision values.
>> However, since the target can’t perform the required computation with half
>> precision values they must be converted back to single for the
>> multiplication. The single precision result of the multiplication is
>> converted to half precision to be stored in the global value, x, but the
>> result is returned as single precision (via xmm0).
>>
>>
>>
>> I’m not primarily worried about the extra conversions here. We can’t get
>> rid of them because we can’t prove they aren’t rounding, but that’s a
>> secondary issue. What I’m worried about is that we allowed/required the
>> back end to improvise an ABI to satisfy the incoming IR, and the choice it
>> made is questionable.
>>
>>
>>
>> For a point of comparison, I looked at what gcc does. Currently, gcc only
>> allows _Float16 in C, not C++, and if you try to use it with a target that
>> doesn’t have native support for half-precision arithmetic, it tells you
>> “’_Float16’ is not supported on this target.” That seems preferable to
>> making up an ABI on the fly.
>>
>>
>>
>> I haven’t looked at what happens with clang when compiling for other
>> targets that don’t have native support for half-precision arithmetic, but I
>> would imagine that similar problems exist.
>>
>>
>>
>> Thoughts?
>>
>>
>>
>> Thanks,
>>
>> Andy
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190123/511af37d/attachment.html>
More information about the cfe-dev
mailing list