[cfe-dev] _Float16 support
John McCall via cfe-dev
cfe-dev at lists.llvm.org
Thu Jan 24 10:57:54 PST 2019
On 24 Jan 2019, at 4:46, Sjoerd Meijer wrote:
> Hello,
>
> I added _Float16 support to Clang and codegen support in the AArch64
> and ARM backends, but have not looked into x86. Ahmed is right:
> AArch64 is fine, only a few ACLE intrinsics are missing. ARM has rough
> edges: scalar codegen should be mostly fine, vector codegen needs some
> more work.
>
> Implementation for AArch64 was mostly straightforward (it only has
> hard float ABI, and has half register/type support), but for ARM it
> was a huge pain to plumb f16 support because of different ABIs
> (hard/soft), different architecture extensions of FP and FP16 support,
> and the existence of another half-precision type with different
> semantics. Sounds like you're doing a similar exercise, and yes,
> argument passing was one of the trickiest parts.
>
>
>> IR and SelectionDAG representational choices aside, it seems to me
>> that,
>
>> like GCC, Clang should not be permitting _Float16 on any target that
>> doesn't
>
>> specify an ABI for it, because otherwise we're just creating future
>> compatibility
>
>> problems for that target. I'm surprised and disappointed that it
>> wasn't implemented
>
>> this way.
>
> Apologies, I missed that.
It's alright, oversights happen (in both patch-writing and review). Can
we get a volunteer to do the work to restrict this now? I'm a little
crushed.
John.
>
> Sjoerd.
>
> ________________________________
> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Kaylor,
> Andrew via llvm-dev <llvm-dev at lists.llvm.org>
> Sent: 24 January 2019 00:23
> To: Ahmed Bougacha; Lu, Hongjiu
> Cc: llvm-dev; cfe-dev at lists.llvm.org
> Subject: Re: [llvm-dev] [cfe-dev] _Float16 support
>
> It seems that there are several issues here:
>
> 1. Should the front end be concerned with whether or not the IR that
> it is emitting can be translated into a well-defined IR?
> 2. How should the selection DAG handle data types whose representation
> isn't defined by the ABI we're targeting?
> 3. What should the ABI do with half-precision floats?
>
> Working backward...
>
> The third question here is obviously target specific. I've talked to
> HJ Lu about this, and he's working on an update to the x86 psABI. I
> believe that his eventual proposal will follow the lines of what you
> (Ahmed) suggested below, but I'm not completely proficient at
> comprehending ABI definitions so there may be some subtlety that I am
> misunderstanding in what he told me. I also talked to Craig about
> would be involved in making the LLVM x86 backend handle 'half' values
> this way. That involves a good bit of work, but it can be done.
>
> The second question above probably involves a mix of
> target-independent and target-specific code. Right now the selection
> DAG code is operating on the assumption that it needs to do
> *something* with any IR it is given. It tries to make a reasonable
> choice, and the choice is consistent and predictable but not
> necessarily what the user expects. It seems like we should at the very
> least be producing a diagnostic so the user knows what we did (or even
> just that we did something). Then there are the specific problems
> Craig has brought up with the way we're currently handling 'half'
> values. Would defining a legal f16 type take care of those problems?
>
> The first question exposes my lack of understanding of the proper role
> of the front end. It isn't clear to me what responsibility the front
> end has for enforcing conformance to the ABI. As a user of the
> compiler, I would like the compiler to tell me when code I've written
> can't be represented using the ABI I am targeting. Whether the front
> end should detect this or the backend, I don't know. I suppose it's
> also an open question how strictly this should be enforced. Is it a
> warning that can be elevated to an error at the users' discretion? Is
> it something that should be blocked by default but enabled by a
> user-specified option? Should it always be rejected?
>
> -Andy
>
> -----Original Message-----
> From: Ahmed Bougacha <ahmed.bougacha at gmail.com>
> Sent: Wednesday, January 23, 2019 3:30 PM
> To: Kaylor, Andrew <andrew.kaylor at intel.com>
> Cc: cfe-dev at lists.llvm.org; llvm-dev <llvm-dev at lists.llvm.org>; Craig
> Topper <craig.topper at gmail.com>; Richard Smith <richard at metafoo.co.uk>
> Subject: Re: [cfe-dev] _Float16 support
>
> Hey Andy,
>
> On Tue, Jan 22, 2019 at 10:38 AM Kaylor, Andrew via cfe-dev
> <cfe-dev at lists.llvm.org> wrote:
>> I'd like to start a discussion about how clang supports _Float16 for
>> target architectures that don't have direct support for 16-bit
>> floating point arithmetic.
>
> Thanks for bringing this up; we'd also like to get better support,
> for sysv x86-64 specifically - AArch64 is mostly fine, and ARM is
> usable with +fp16.
>
> I'm not sure much of this discussion generalizes across platforms
> though (beyond Craig's potential bug fix?). I guess the
> "target-independent" question is: should we allow this kind of
> "legalization" in the vreg assignment code at all? (I think that's
> where it all comes from: RegsForValue, TLI::get*Register*) It's
> convenient for experimental frontends: you can use weird types (half,
> i3, ...) without worrying too much about it, and you usually get
> something self-consistent out of the backend. But you eventually need
> to worry about it and need to make the calling convention explicit.
> But I guess that's a discussion for the other thread ;)
>
>> The current clang language extensions documentation says, "If
>> half-precision instructions are unavailable, values will be promoted
>> to single-precision, similar to the semantics of __fp16 except that
>> the results will be stored in single-precision." This is somewhat
>> vague (to me) as to what is meant by promotion of values, and the
>> part about results being stored in single-precision isn't what
>> actually happens.
>>
>> Consider this example:
>>
>> _Float16 x;
>> _Float16 f(_Float16 y, _Float16 z) {
>> x = y * z;
>> return x;
>> }
>>
>> When compiling with “-march=core-avx2” that results (after some
>> trivial cleanup) in this IR:
>>
>> @x = global half 0xH0000, align 2
>> define half @f(half, half) {
>> %3 = fmul half %0, %1
>> store half %3, half* @x
>> ret half %3
>> }
>>
>> That’s not too unreasonable I suppose, except for the fact that it
>> hasn’t taken the lack of target support for half-precision
>> arithmetic into account yet. That will happen in the selection DAG.
>> The assembly code generated looks like this (with my annotations):
>>
>> f: # @f
>> # %bb.0:
>> vcvtps2ph xmm1, xmm1, 4 # Convert argument 1
>> from single to half
>> vcvtph2ps xmm1, xmm1 # Convert argument
>> 1 back to single
>> vcvtps2ph xmm0, xmm0, 4 # Convert argument 0
>> from single to half
>> vcvtph2ps xmm0, xmm0 # Convert argument
>> 0 back to single
>> vmulss xmm0, xmm0, xmm1 # xmm0 = xmm0*xmm1
>> (single precision)
>> vcvtps2ph xmm1, xmm0, 4 # Convert the single
>> precision result to half
>> vmovd eax, xmm1 # Move the
>> half precision result to eax
>> mov word ptr [rip + x], ax # Store the
>> half precision result in the global, x
>> ret
>> # Return the single precision result still in xmm0
>> .Lfunc_end0:
>> # -- End function
>>
>> Something odd has happened here, and it may not be obvious what it
>> is. This code begins by converting xmm0 and xmm1 from single to half
>> and then back to single. The first conversion is happening because
>> the back end decided that it needed to change the types of the
>> parameters to single precision but the function body is expecting
>> half precision values. However, since the target can’t perform the
>> required computation with half precision values they must be
>> converted back to single for the multiplication. The single precision
>> result of the multiplication is converted to half precision to be
>> stored in the global value, x, but the result is returned as single
>> precision (via xmm0).
>>
>> I’m not primarily worried about the extra conversions here. We
>> can’t get rid of them because we can’t prove they aren’t
>> rounding, but that’s a secondary issue. What I’m worried about is
>> that we allowed/required the back end to improvise an ABI to satisfy
>> the incoming IR, and the choice it made is questionable.
>
> As Richard said, an ABI rule emerged from the implementation, and I
> believe we should solidify it, so here's a simple strawman proposal:
> pass scalars in the low 16 bits of SSE registers, don't change the
> memory layout, and pack them in vectors of 16-bit elements. That
> matches the only ISA extension so far (ph<>ps conversions), and fits
> well with that (as opposed to i16 coercion) as well as vectors (as
> opposed to f32 promotion). To my knowledge, there hasn't been any
> alternative ABI proposal (but I haven't looked in 1 or 2 years). It's
> interesting because we technically have no way of accessing scalars
> (so we have the same problems as i8/i16 vector elements, but without
> the saving grace of having matching GPRs - x86, or direct copies -
> aarch64), and there are not even any scalar operations.
>
> Any thoughts? We can suggest this to x86-psABI if folks think this is
> a good idea. (I don't know about other ABIs or other architectures
> though).
>
> Concretely, this means no/little change in IRGen. As for the SDAG
> implementation, this is an unusual situation. I've done some
> experimentation a long time ago. We can make the types legal, even
> though no operations are. It's relatively straightforward to promote
> all operations (and we made sure that worked years ago for AArch64,
> for the pre-v8.2 mode), but vectors are fun, because of build_vector
> (where it helps to have the truncating behavior we have for integers,
> but for fp), extract_vector_elt (where you need the matching extend),
> and insert_vector_elt (which you have to lower using some movd and/or
> pinsrw trickery, if you want to avoid the generic slow via-memory
> fallback).
> Alternatively, we can immediately, in call lowering/register
> assignment logic (this covers the SDAG cross-BB vreg assignments Craig
> mentions) promote to f32 "via" i16. I'm afraid I don't remember the
> arguments one way or the other, I can dust off my old patches and put
> them up on phabricator.
>
>
> -Ahmed
>
>>
>> For a point of comparison, I looked at what gcc does. Currently, gcc
>> only allows _Float16 in C, not C++, and if you try to use it with a
>> target that doesn’t have native support for half-precision
>> arithmetic, it tells you “’_Float16’ is not supported on this
>> target.” That seems preferable to making up an ABI on the fly.
>>
>> I haven’t looked at what happens with clang when compiling for
>> other targets that don’t have native support for half-precision
>> arithmetic, but I would imagine that similar problems exist.
>>
>> Thoughts?
>>
>> Thanks,
>> Andy
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose
> the contents to any other person, use it for any purpose, or store or
> copy the information in any medium. Thank you.
More information about the cfe-dev
mailing list