[PATCH] D120395: [X86] Prohibit arithmetic operations on type `__bfloat16`

Thu Mar 3 16:23:16 PST 2022

craig.topper added a comment.

In D120395#3358453 <https://reviews.llvm.org/D120395#3358453>, @andrew.w.kaylor wrote:

> In D120395#3356355 <https://reviews.llvm.org/D120395#3356355>, @pengfei wrote:
>
>> Good question! This is actually the scope of ABI. Unfortunately, we don't have the BF16 ABI at the present. We can't assume what are the physical registers the arguments been passed and returned before we have such a hardware. For example, ARM has soft FP ABI that supports FP arithmetic operations and passes and returns arguments by integer registers. When we enabling some ISA set whose type doesn't have ABI representation, e.g., F16C, we borrowed such conception. And as a trade off, we used integer rather than introducing a new IR type, since we don't need to support the arithmetic operations.
>
> I don't see the point of the ARM soft-float comparison, given that X86 doesn't have the strict distinction between integer and floating point registers that ARM has, at least not for the XMM/YMM/ZMM registers. Consider the following code:
>
>   __m128bh foo(__m128 x) {
>     return _mm_cvtneps_pbh(x);
>   }
>   __m128 bar(__m128bh x) {
>     return _mm_cvtpbh_ps(x);
>   }
>
> Currently, both clang and gcc will use XMM0 for the argument and return value in both functions. Is XMM0 an integer register or a floating point register? There is no such distinction. It's true that the x86_64 psABI does talk about the general purpose registers as integer registers, and both clang and gcc will use one of these registers for `__bfloat16` values, but that's an implementation detail (and a dubious one, considering that nearly anything useful that you can do with a `__bfloat16` will require moving it into an SSE register).
>
> Also, you say we can't assume what registers will be used (in the eventual ABI?) but we are assuming exactly that. If the ABI is ever defined differently than what clang and gcc are currently doing, they will both be wrong.
>
> But all of this only applies to the backend code generation. It has very little to do with the intrinsic definition in the header file or the IR generated by the front end. If we continue to define `__bfloat16` as an `unsigned short` in the header file, the front end will treat it as an `unsigned short` and it will use its rules for `unsigned short` to generate IR. If the ABI is ever defined to treat BF16 differently than `unsigned short`, the front end won't be able to do anything about that because we've told the front end that the value is an unsigned short.
>
> On the other hand, if we define the `__bfloat16` type as the built-in `__bf16` type, then the front end can apply whatever rules it has for that type, including adding whatever ABI handling is needed for BF16 values. If that ends up being the same as the rules for `unsigned short`, that's no problem. The front end can implement it that way. If it ends up being something different, the front end can apply rules for whatever the alternative is. The point is, by telling the front end that this is a BF16 value, we allow the front end to control the semantics for it. This would then result in the front end generating IR using `bfloat` as the type for BF16 values. Again, this is a correct and accurate description of the value. It allows the optimizer to reason about it correctly in any way it needs to.
>
> I don't see why we would treat BF16 values as `unsigned short` and `i16` throughout the compiler just to make the backend implementation easier when we already have types available for BF16.

__m256bh should not have been a new type. It should have been an alias of __m256i. We don't have load/store intrinsics for __m256bh so if you can even get the __m256bh type in and out of memory using load/store intrinsics, it is only because we allow lax vector conversion by default. -fno-lax-vector-conversions will probably break any code trying to load/store it using a load/store intrinsic. If __m256bh was made a struct as at one point proposed, this would have been broken.

If we want __m256bh to be a unique type using __bf16, we must define load, store, and cast intrinsics for it. We would probably want insert/extract element intrinsics as well.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120395/new/

https://reviews.llvm.org/D120395