[cfe-dev] [RFC] implementation of _Float16

Thu May 11 16:57:01 PDT 2017

> On May 11, 2017, at 7:11 PM, Hal Finkel via cfe-dev <cfe-dev at lists.llvm.org> wrote:
> 
> That's what's been asserted here as well. The question is: If we're going to want a type that represents half precision without the implied extend/truncate operations, do we a) Introduce a new type that is "really" a half or b) change half not to imply the extend/truncate and then autoupgrade?

Just to try to try to be precise, I want to broaden this slightly and try to sketch out all the questions around this. Apologies if the answers to these are obvious or you feel like they’re already settled. I’d like to make sure we define the scope of the decisions pretty clearly before bike shedding it to death =)

(a) For targets that do not have fp16 hardware support, what is FLT_EVAL_METHOD (I’m using the C-language bindings here so that there are semi-formal definitions that people can look up, but this is at least partially a non-language specific policy decision)?

	- We could choose FLT_EVAL_METHOD = 0, which requires us to “simulate” _Float16 operations by upconverting to a legal type (float), doing the operation in float, and converting back to _Float16 for every operation (this works for all the arithmetic instructions, except fma, which would require a libcall or other special handling, but we would want fma formation from mul + add to still be licensed when allowed by program semantics).

	- We could choose FLT_EVAL_METHOD = 32, which allows us to maintain extra precision by eliding the conversions to/from _Float16 around each operation (leaving intermediate results in `float`).

The second option obviously yields better performance on many targets, but slightly reduces portability; targets without _Float16 support now get different answers than targets that have _Float16 support for basic arithmetic. The second option matches (I think?) the intended behavior of the arm __fp16 extension.

(b) For targets that have fp16 hardware support, we still get to choose FLT_EVAL_METHOD.

	- Use the fp16 hardware. FLT_EVAL_METHOD = 0.

	- The other choice is FLT_EVAL_METHOD = 32 (matching the existing behavior of __fp16, but making it much harder for people to take advantage of the shiny new instructions—they would have to use intrinsics—and severely hampering the autovectorizer’s options).

It sounds like everyone is settled on the first choice (and I agree with that), but let’s be clear that this *is* a decision that we’re making.

(c) Assuming FLT_EVAL_METHOD = 0 for targets with fp16 hardware, do we need to support a type with the __fp16 extension semantics of “implicitly promote everything to float” for the purposes of source compatibility?

Sounds like “yes”, at least for some toolchains.

(d) If yes, does that actually require a separate type at LLVM IR layer?

I don’t immediately see that it would, but I am not an expert.

Anything I missed?
– Steve
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20170511/3f99d8a1/attachment.html>