[cfe-dev] _Float16 support

Wed Jan 23 11:27:30 PST 2019

While looking at the codegen Andy showed, I notice that the initial
SelectionDAG looks like this for x86-64.

  t0: ch = EntryToken
      t2: f32,ch = CopyFromReg t0, Register:f32 %0
    t6: f16 = fp_round t2, TargetConstant:i64<1>
      t4: f32,ch = CopyFromReg t0, Register:f32 %1
    t7: f16 = fp_round t4, TargetConstant:i64<1>
  t8: f16 = fmul t6, t7
  t10: i64 = Constant<0>
    t12: ch = store<(store 2 into @x)> t0, t8, GlobalAddress:i64<half* @x>
0, undef:i64
    t13: f32 = fp_extend t8
  t16: ch,glue = CopyToReg t12, Register:f32 $xmm0, t13
  t17: ch = X86ISD::RET_FLAG t16, TargetConstant:i32<0>, Register:f32
$xmm0, t16:1

The FP_ROUNDs for the arguments each have the flag set that indicates that
the fp_round doesn't lose any information. This is the
TargetConstant:i64<1> as the second operand.

As far as I can tell, any caller of this would have an FP_EXTEND from f16
to f32 in their initial selection dag for calling this function. When the
FP_EXTENDs are type legalized
by DAGTypeLegalizer::PromoteFloatOp_FP_EXTEND, the FP_EXTEND will be
removed completely with no replacement operations. I believe this means
there is no guarantee that the f32 value passed in doesn't contain
precision beyond the range of f16. So the fp_round nodes saying no
information is lost in the callee are not accurate.

~Craig

On Tue, Jan 22, 2019 at 10:38 AM Kaylor, Andrew via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> I'd like to start a discussion about how clang supports _Float16 for
> target architectures that don't have direct support for 16-bit floating
> point arithmetic.
>
>
>
> The current clang language extensions documentation says, "If
> half-precision instructions are unavailable, values will be promoted to
> single-precision, similar to the semantics of __fp16 except that the
> results will be stored in single-precision." This is somewhat vague (to me)
> as to what is meant by promotion of values, and the part about results
> being stored in single-precision isn't what actually happens.
>
>
>
> Consider this example:
>
>
>
> _Float16 x;
>
> _Float16 f(_Float16 y, _Float16 z) {
>
>   x = y * z;
>
>   return x;
>
> }
>
>
>
> When compiling with “-march=core-avx2” that results (after some trivial
> cleanup) in this IR:
>
>
>
> @x = global half 0xH0000, align 2
>
> define half @f(half, half) {
>
>   %3 = fmul half %0, %1
>
>   store half %3, half* @x
>
>   ret half %3
>
> }
>
>
>
> That’s not too unreasonable I suppose, except for the fact that it hasn’t
> taken the lack of target support for half-precision arithmetic into account
> yet. That will happen in the selection DAG. The assembly code generated
> looks like this (with my annotations):
>
>
>
> f:                                      # @f
>
> # %bb.0:
>
>        vcvtps2ph       xmm1, xmm1, 4             # Convert argument 1 from
> single to half
>
>         vcvtph2ps       xmm1, xmm1                # Convert argument 1
> back to single
>
>         vcvtps2ph       xmm0, xmm0, 4            # Convert argument 0 from
> single to half
>
>         vcvtph2ps       xmm0, xmm0                # Convert argument 0
> back to single
>
>         vmulss             xmm0, xmm0, xmm1   # xmm0 = xmm0*xmm1 (single
> precision)
>
>         vcvtps2ph       xmm1, xmm0, 4            # Convert the single
> precision result to half
>
>         vmovd             eax, xmm1                      # Move the half
> precision result to eax
>
>         mov                 word ptr [rip + x], ax     # Store the half
> precision result in the global, x
>
>         ret                                                             #
> Return the single precision result still in xmm0
>
> .Lfunc_end0:
>
>                                         # -- End function
>
>
>
> Something odd has happened here, and it may not be obvious what it is.
> This code begins by converting xmm0 and xmm1 from single to half and then
> back to single. The first conversion is happening because the back end
> decided that it needed to change the types of the parameters to single
> precision but the function body is expecting half precision values.
> However, since the target can’t perform the required computation with half
> precision values they must be converted back to single for the
> multiplication. The single precision result of the multiplication is
> converted to half precision to be stored in the global value, x, but the
> result is returned as single precision (via xmm0).
>
>
>
> I’m not primarily worried about the extra conversions here. We can’t get
> rid of them because we can’t prove they aren’t rounding, but that’s a
> secondary issue. What I’m worried about is that we allowed/required the
> back end to improvise an ABI to satisfy the incoming IR, and the choice it
> made is questionable.
>
>
>
> For a point of comparison, I looked at what gcc does. Currently, gcc only
> allows _Float16 in C, not C++, and if you try to use it with a target that
> doesn’t have native support for half-precision arithmetic, it tells you
> “’_Float16’ is not supported on this target.” That seems preferable to
> making up an ABI on the fly.
>
>
>
> I haven’t looked at what happens with clang when compiling for other
> targets that don’t have native support for half-precision arithmetic, but I
> would imagine that similar problems exist.
>
>
>
> Thoughts?
>
>
>
> Thanks,
>
> Andy
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190123/c5b1f242/attachment.html>