[cfe-dev] [RFC] implementation of _Float16

Mon May 15 07:08:40 PDT 2017

Yes, you’re right. That’s what I get for writing it up from memory instead of just re-reading the spec.

The choice is really between FLT_EVAL_METHOD=16 (evaluate _Float16 in _Float16) and FLT_EVAL_METHOD=0 (evaluate _Float16 in float).

– Steve

> On May 15, 2017, at 9:50 AM, Sjoerd Meijer <Sjoerd.Meijer at arm.com> wrote:
> 
> Hi Steve,
>  
> Thanks for the explanations and pointers!
>  
> > Anything I missed?
>  
> Maybe. Reading your reply and also N1945 again, we got confused (my colleague Simon Tatham helped me here). You seem to be interpreting FLT_EVAL_METHOD=0 as saying that fp16 operations are done in fp16 precision, and FLT_EVAL_METHOD=32 as saying that fp16 operations are done in fp32 precision.
>  
> But by our reading of N1945, FLT_EVAL_METHOD=0 and FLT_EVAL_METHOD=32 mean the same thing, at least on a platform where float is 32 bits wide. Each one says that for some type T, expressions whose semantic type is T or smaller are evaluated in type T; and for FLT_EVAL_METHOD=0, T = float, whereas for FLT_EVAL_METHOD=32, T = _Float32, i.e. the same physical type in both cases.
>  
> So surely, to indicate that fp16 operations are done in fp16 precision, we would have to set FLT_EVAL_METHOD to 16, not to 0?
>  
> Cheers,
> Sjoerd.
>  
>  
> From: cfe-dev [mailto:cfe-dev-bounces at lists.llvm.org] On Behalf Of Stephen Canon via cfe-dev
> Sent: 12 May 2017 00:57
> To: Hal Finkel
> Cc: nd; clang developer list
> Subject: Re: [cfe-dev] [RFC] implementation of _Float16
>  
>  
> On May 11, 2017, at 7:11 PM, Hal Finkel via cfe-dev <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>  
> That's what's been asserted here as well. The question is: If we're going to want a type that represents half precision without the implied extend/truncate operations, do we a) Introduce a new type that is "really" a half or b) change half not to imply the extend/truncate and then autoupgrade?
>  
> Just to try to try to be precise, I want to broaden this slightly and try to sketch out all the questions around this. Apologies if the answers to these are obvious or you feel like they’re already settled. I’d like to make sure we define the scope of the decisions pretty clearly before bike shedding it to death =)
>  
> (a) For targets that do not have fp16 hardware support, what is FLT_EVAL_METHOD (I’m using the C-language bindings here so that there are semi-formal definitions that people can look up, but this is at least partially a non-language specific policy decision)?
>  
>             - We could choose FLT_EVAL_METHOD = 0, which requires us to “simulate” _Float16 operations by upconverting to a legal type (float), doing the operation in float, and converting back to _Float16 for every operation (this works for all the arithmetic instructions, except fma, which would require a libcall or other special handling, but we would want fma formation from mul + add to still be licensed when allowed by program semantics).
>  
>             - We could choose FLT_EVAL_METHOD = 32, which allows us to maintain extra precision by eliding the conversions to/from _Float16 around each operation (leaving intermediate results in `float`).
>  
> The second option obviously yields better performance on many targets, but slightly reduces portability; targets without _Float16 support now get different answers than targets that have _Float16 support for basic arithmetic. The second option matches (I think?) the intended behavior of the arm __fp16 extension.
>  
> (b) For targets that have fp16 hardware support, we still get to choose FLT_EVAL_METHOD.
>  
>             - Use the fp16 hardware. FLT_EVAL_METHOD = 0.
>  
>             - The other choice is FLT_EVAL_METHOD = 32 (matching the existing behavior of __fp16, but making it much harder for people to take advantage of the shiny new instructions—they would have to use intrinsics—and severely hampering the autovectorizer’s options).
>  
> It sounds like everyone is settled on the first choice (and I agree with that), but let’s be clear that this *is* a decision that we’re making.
>  
> (c) Assuming FLT_EVAL_METHOD = 0 for targets with fp16 hardware, do we need to support a type with the __fp16 extension semantics of “implicitly promote everything to float” for the purposes of source compatibility?
>  
> Sounds like “yes”, at least for some toolchains.
>  
> (d) If yes, does that actually require a separate type at LLVM IR layer?
>  
> I don’t immediately see that it would, but I am not an expert.
>  
> Anything I missed?
> – Steve

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20170515/d2151891/attachment.html>