[cfe-dev] [RFC] implementation of _Float16

Sjoerd Meijer via cfe-dev cfe-dev at lists.llvm.org
Thu May 11 04:22:06 PDT 2017


Hi Hal,

You mentioned "I'd be in favor of changing the current semantics". Just checking: do you mean the semantics of __fp16?
Because that is exactly what we are trying to avoid with introducing a new true half type; changing semantics of fp16 would break backward compatibility.

> By "when required", do you mean when the result would
> be the same as if the operation had been performed in single
> precision? If so, then no, we need different semantics

I think that is indeed the case, but I am double checking that.

Cheers,
Sjoerd.

From: Hal Finkel [mailto:hfinkel at anl.gov]
Sent: 10 May 2017 17:40
To: Sjoerd Meijer; Martin J. O'Riordan
Cc: 'clang developer list'; nd
Subject: Re: [cfe-dev] [RFC] implementation of _Float16



On 05/10/2017 11:15 AM, Sjoerd Meijer wrote:
The thing that confused me again is that for simple expressions/examples like this:

__fp16 MyAdd(__fp16 a, __fp16 b) {
  return a + b;
}

The IR does not include promotions/truncations which you would expect (because operations are done on floats):

define half @MyAdd(half %a, half %b) local_unnamed_addr #0 {
entry:
  %0 = fadd half %a, %b
  ret half %0
}

But that is only because there is this optimisation that does not include them if it can prove that the result with/without these converts is the same, so in other cases the promotes/truncates are there as expected.

This means that Clang produces the necessary promotions when needed, and that a new _Float16 type can also be mapped onto the LLRM IR half type I think (no changes needed). Yes, then the approach could indeed be to treat it as a native type, and only promote operands to floats when required.

By "when required", do you mean when the result would be the same as if the operation had been performed in single precision? If so, then no, we need different semantics. That having been said, I'd be in favor of changing the current semantics to require explicit promotions/truncations, change the existing optimization to elide them when they're provably redundant (as we do with other such things), and then only have a single, true, half-precision type. I suspect that we'd need to figure out how to auto-upgrade, but that seems doable.

 -Hal



Cheers,
Sjoerd.

From: Hal Finkel [mailto:hfinkel at anl.gov]
Sent: 10 May 2017 16:00
To: Martin J. O'Riordan; Sjoerd Meijer
Cc: 'clang developer list'
Subject: Re: [cfe-dev] [RFC] implementation of _Float16



On 05/10/2017 09:01 AM, Martin J. O'Riordan wrote:
Yes, I see how this would be an issue if it is necessary to keep the storage-only versus native types separate.

At the moment I have 'short float' internally associated with OpenCL's 'half' but I do not enable 'half' as a keyword.  Independently I have made '__fp16' when used with our target also be a synonym for 'short float/half' (simply to avoid adding a new keyword).  This in turn is bound to the IEEE FP16 using 'HalfFormat = &llvm::APFloat::IEEEhalf();'.

In our case it is always a native type and never a storage only type, so coupling '__fp16' to 'half' made sense.  Certainly if the native versus storage-only variants were distinct, then this association I have made would have to be decoupled (not a big-deal).

Another approach might be to always work with FP16 as-if native, but to provide only Load/Store instructions in the TableGen descriptions for FP16, and to adapt lowering to always perform the arithmetic using FP32 if the selected target does not support native FP16 - would that be feasible in your case?  In this way it is not really any different to how targets that have no FPU can use an alternative integer based implementation (with the help of 'compiler-rt').

I can certainly see how something like 'ADD' of 'f16' could be changed to use 'Expand' in lowering rather than 'Legal' as a function of the selected target (or some other target specific option) - we just marked it 'Legal' and provided the corresponding instructions in TableGen with very little custom lowering necessary.  I have a mild concern that LLVM would have to have an 'f16' which is native and another kind-of 'f16' restricted to being only storage.

Why? That should only be true if they have different semantics.

 -Hal




Thanks,

            MartinO

From: Sjoerd Meijer [mailto:Sjoerd.Meijer at arm.com]
Sent: 10 May 2017 14:19
To: Martin J. O'Riordan <martin.oriordan at movidius.com><mailto:martin.oriordan at movidius.com>; 'Hal Finkel' <hfinkel at anl.gov><mailto:hfinkel at anl.gov>
Cc: 'clang developer list' <cfe-dev at lists.llvm.org><mailto:cfe-dev at lists.llvm.org>
Subject: RE: [cfe-dev] [RFC] implementation of _Float16

Hi Hal, Martin,

Thanks for the feedback.
Yes, the issue indeed is that '__fp16' is already used to implement a storage-only type.  And earlier I wrote that I don't expect LLVM IR changes, but now I am not so sure anymore if both types map onto the same half LLVM IR type. With  two half precision types, __fp16 and _Float16, where one is a storage only type and the other a native type, somehow the distinction between these two must be made I think.

Cheers,
Sjoerd.

From: Martin J. O'Riordan [mailto:martin.oriordan at movidius.com]
Sent: 10 May 2017 14:13
To: 'Hal Finkel'; Sjoerd Meijer
Cc: 'clang developer list'
Subject: RE: [cfe-dev] [RFC] implementation of _Float16

Our Out-of-Tree target implements fully native FP16 operations based on '__fp16' (scalar and SIMD vector), so is the issue for ARM that '__fp16' is already used to implement a storage-only type and that another type is needed to differentiate between a native and a storage-only type?  Once the 'f16' type appears in the IR (and the vector variants) the code-generation is straightforward enough.

Certainly we have had to make many changes to CLang and to LLVM to fully implement this including suppression of implicit conversion to 'double', but nothing scary or obscure.  Many of these changes are simply to enable something that is already normal for OpenCL, but to do so for C and C++.

More controversially we also added a "synonym" for this using 'short float' rather than '_Float16' (or OpenCL's 'half'), and created a parallel set of the ISO C library functions using 's' to suffix the usual names (e.g. 'tan', 'tanf', 'tanl' plus 'tans').  The 's' suffix was unambiguous (though we actually use the double-underscore prefix, e.g. '__tans' to avoid conflict with the user's names) and the type 'short float' was available too without breaking anything.  Enabling the 'h' suffix for FP constants (again from OpenCL) makes the whole fit smoothly with the normal FP types.

However, for variadic functions (such as 'printf') we do promote to 'double' because there are no formatting specifiers available for 'half' any more than there is support for 'float' - it is also consistent with 'va_arg' usage for 'char' and 'short' as 'int'.  My feeling is that using implementation defined types 'float', 'double' and 'long double' can be extended to include 'short float' without dictating that they have any particular bit-sizes (e.g. FP16 for 'half').

This solution has worked very well over the past few years and is symmetric with the other floating-point data types.

There are some issues with C++ and overloading because '__fp16' to other FP types (and INT types) is not ranked in exactly the same way as for example 'float' is to other FP types; but this is really only because it is not a 1st class citizen of the type-system and the rules would need to be specified to make this valid.  I have not tried to fix this as it works reasonably well as it is, and it would really be an issue for the C++ committee to decide if they ever choose to adopt another FP data type.  I did add it to the traits in the C++ library though so that it is considered legal for floating-point types.

I'd love to see this adopted as a formal type in a future version of ISO C and ISO C++.

            MartinO

From: cfe-dev [mailto:cfe-dev-bounces at lists.llvm.org] On Behalf Of Hal Finkel via cfe-dev
Sent: 10 May 2017 11:39
To: Sjoerd Meijer <Sjoerd.Meijer at arm.com<mailto:Sjoerd.Meijer at arm.com>>; cfe-dev at lists.llvm.org<mailto:cfe-dev at lists.llvm.org>
Subject: Re: [cfe-dev] [RFC] implementation of _Float16

On 05/10/2017 05:18 AM, Sjoerd Meijer via cfe-dev wrote:
Hi,

ARMv8.2-A introduces as an optional extension half-precision data-processing instructions for Advanced SIMD and floating-point in both AArch64 and AArch32 states [1], and we are looking into implementing C/C++-language support for these new ARMv8.2-A half-precision instructions.

We would like to introduce a new Clang type. The reason is that we e.g. cannot use type __fp16 (defined in the ARM C Language Extensions [2]) because it is a storage type only. This means when using standard C operators values of __fp16 type promote to float when used in arithmetic operations, which we would like to avoid for the ARMv8.2-A half-precision instructions. Please note that the LLVM IR already has a half precision type, onto which for example __fp16 is mapped, so there are no changes or additions required for the LLVM IR.

As a new Clang type we would like to propose _Float16 as defined in a C11 extension, see [3]. Arithmetic is well defined, it is not only a storage type as __fp16. Our question is whether a partial implementation, just implementing this type and not claiming (full) C11 conformance is acceptable?

I would very much like to see fp16 as a first-class floating-point type in Clang and LLVM (i.e. handling that is not just a storage type). Doing this in Clang in a way that is specified by C11 seems like the right approach. I don't see why implementing this would be predicated on implementing other parts of C11.

 -Hal
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.




--

Hal Finkel

Lead, Compiler Technology and Programming Languages

Leadership Computing Facility

Argonne National Laboratory



--

Hal Finkel

Lead, Compiler Technology and Programming Languages

Leadership Computing Facility

Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20170511/9c9571e5/attachment.html>


More information about the cfe-dev mailing list