[PATCH] D150913: [Clang][Bfloat16] Upgrade __bf16 to arithmetic type, change mangling, and extend excess precision support.

Wed May 24 05:25:10 PDT 2023

pengfei added inline comments.

================
Comment at: clang/docs/LanguageExtensions.rst:852
 ``double`` when passed to ``printf``, so the programmer must explicitly cast it to
 ``double`` before using it with an ``%f`` or similar specifier.

----------------
rjmccall wrote:
> Suggested rework:
> 
> ```
> Clang supports three half-precision (16-bit) floating point types: ``__fp16``,
> ``_Float16`` and ``__bf16``.  These types are supported in all language
> modes, but not on all targets:
> 
> - ``__fp16`` is supported on every target.
> 
> - ``_Float16`` is currently supported on the following targets:
>   * 32-bit ARM (natively on some architecture versions)
>   * 64-bit ARM (AArch64) (natively on ARMv8.2a and above)
>   * AMDGPU (natively)
>   * SPIR (natively)
>   * X86 (if SSE2 is available; natively if AVX512-FP16 is also available)
> 
> - ``__bf16`` is currently supported on the following targets:
>   * 32-bit ARM
>   * 64-bit ARM (AArch64)
>   * X86 (when SSE2 is available)
> 
> (For X86, SSE2 is available on 64-bit and all recent 32-bit processors.)
> 
> ``__fp16`` and ``_Float16`` both use the binary16 format from IEEE
> 754-2008, which provides a 5-bit exponent and an 11-bit significand
> (counting the implicit leading 1).  ``__bf16`` uses the `bfloat16
> <https://en.wikipedia.org/wiki/Bfloat16_floating-point_format>`_ format,
> which provides an 8-bit exponent and an 8-bit significand; this is the same
> exponent range as `float`, just with greatly reduced precision.
> 
> ``_Float16`` and ``__bf16`` follow the usual rules for arithmetic
> floating-point types.  Most importantly, this means that arithmetic operations
> on operands of these types are formally performed in the type and produce
> values of the type.  ``__fp16`` does not follow those rules: most operations
> immediately promote operands of type ``__fp16`` to ``float``, and so
> arithmetic operations are defined to be performed in ``float`` and so result in
> a value of type ``float`` (unless further promoted because of other operands).
> See below for more information on the exact specifications of these types.
> 
> Only some of the supported processors for ``__fp16`` and ``__bf16`` offer
> native hardware support for arithmetic in their corresponding formats.
> The exact conditions are described in the lists above.  When compiling for a
> processor without native support, Clang will perform the arithmetic in
> ``float``, inserting extensions and truncations as necessary.  This can be
> done in a way that exactly emulates the behavior of hardware support for
> arithmetic, but it can require many extra operations.  By default, Clang takes
> advantage of the C standard's allowances for excess precision in intermediate
> operands in order to eliminate intermediate truncations within statements.
> This is generally much faster but can generate different results from strict
> operation-by-operation emulation.
> 
> The use of excess precision can be independently controlled for these two
> types with the ``-ffloat16-excess-precision=`` and
> ``-fbfloat16-excess-precision=`` options.  Valid values include:
> - ``none`` (meaning to perform strict operation-by-operation emulation)
> - ``standard`` (meaning that excess precision is permitted under the rules
>   described in the standard, i.e. never across explicit casts or statements)
> - ``fast`` (meaning that excess precision is permitted whenever the
>   optimizer sees an opportunity to avoid truncations; currently this has no
>   effect beyond ``standard``)
> 
> The ``_Float16`` type is an interchange floating type specified in
>  ISO/IEC TS 18661-3:2015 ("Floating-point extensions for C").  It will
> be supported on more targets as they define ABIs for it.
> 
> The ``__bf16`` type is a non-standard extension, but it generally follows
> the rules for arithmetic interchange floating types from ISO/IEC TS
> 18661-3:2015.  In previous versions of Clang, it was a storage-only type
> that forbade arithmetic operations.  It will be supported on more targets
> as they define ABIs for it.
> 
> The ``__fp16`` type was originally an ARM extension and is specified
> by the `ARM C Language Extensions <https://github.com/ARM-software/acle/releases>`_.
> Clang uses the ``binary16`` format from IEEE 754-2008 for ``__fp16``,
> not the ARM alternative format.  Operators that expect arithmetic operands
> immediately promote ``__fp16`` operands to ``float``.
> 
> It is recommended that portable code use ``_Float16`` instead of ``__fp16``,
> as it has been defined by the C standards committee and has behavior that is
> more familiar to most programmers.
> 
> Because ``__fp16`` operands are always immediately promoted to ``float``, the
> common real type of ``__fp16`` and ``_Float16`` for the purposes of the usual
> arithmetic conversions is ``float``.
> 
> A literal can be given ``_Float16`` type using the suffix ``f16``. For example,
> ``3.14f16``.
> 
> Because default argument promotion only applies to the standard floating-point
> types, ``_Float16`` values are not promoted to ``double`` when passed as variadic
> or untyped arguments.  As a consequence, some caution must be taken when using
> certain library facilities with ``_Float16``; for example, there is no ``printf`` format
> specifier for ``_Float16``, and (unlike ``float``) it will not be implicitly promoted to
> ``double`` when passed to ``printf``, so the programmer must explicitly cast it to
> ``double`` before using it with an ``%f`` or similar specifier.
> ```
```
Only some of the supported processors for ``__fp16`` and ``__bf16`` offer
native hardware support for arithmetic in their corresponding formats.
```

Do you mean ``_Float16``?

```
The exact conditions are described in the lists above.  When compiling for a
processor without native support, Clang will perform the arithmetic in
``float``, inserting extensions and truncations as necessary.
```

It's a bit conflict with `These types are supported in all language modes, but not on all targets`.
Why do we need to emulate for a type that doesn't necessarily support on all target?

My understand is that inserting extensions and truncations are used for 2 purposes:
1. A type that is designed to support all target. For now, it's only used for __fp16.
2. Support excess-precision=`standard`. This applies for both _Float16 and __bf16.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150913/new/

https://reviews.llvm.org/D150913