[PATCH] D120395: [X86] Prohibit arithmetic operations on type `__bfloat16`

Wed Mar 2 18:21:39 PST 2022

andrew.w.kaylor added a comment.
Herald added a project: All.

In D120395#3346591 <https://reviews.llvm.org/D120395#3346591>, @scanon wrote:

> There's a lot of churn around proposed "solutions" on this and related PR, but not a very clear analysis of what the problem we're trying to solve is.

I thought the problem that the patch was originally trying to solve is that `__bfloat16` variables could be used in arithmetic operations (using the `__bfloat16` type defined in the avx512bf16intrin.h header). For example, clang currently compiles this code without any diagnostics if the target processor has the required features (avx512bf16 and avx512vl).

  #include <immintrin.h>
  float f(float x, float y) {
    __bfloat16 x16 = _mm_cvtness_sbh(x);
    __bfloat16 y16 = _mm_cvtness_sbh(y);
    __bfloat16 z16 = x16 + y16;
    return _mm_cvtsbh_ss(z16);
  }

https://godbolt.org/z/vcbcGsPPx

The problem is that the instructions generated for that code are completely wrong because `__bfloat` is defined as `unsigned short`. It relies on the user knowing that they shouldn't use this type in arithmetic operations.

Like I said, I //thought// that was the original intention of this patch. However, the latest version of the patch doesn't prevent this at all. In fact, it makes the problem worse by asking the user to define the BF16 variables as unsigned short in their code. Getting correct behavior from this point

@pengfei Please correct me if I misunderstood the purpose of this patch.

In D120395#3346591 <https://reviews.llvm.org/D120395#3346591>, @scanon wrote:

> Concretely, what are the semantics that we want for the BF16 types and intrinsics? Unlike the other floating-point types, there's no standard to guide this, so it's even more important to clearly specify how these types are to be used, instead of having an ad-hoc semantics of whatever someone happens to implement.

The binary representation of a BF16 value (such as the value returned by _mm_cvtness_sbh) is, as Phoebe mentioned, the "brain floating point type" as described here: https://en.wikichip.org/wiki/brain_floating-point_format

Unfortunately, what you can do with it seems to depend on the target architecture. For very recent x86 processors, you can convert vectors of this type to and from single precision floating point and you can do a SIMD dot product and accumulate operation (VDPBF16PS), but the only way to do this is with intrinsics. Some ARM processors support other operations, but I think with similar restrictions (i.e. only accessible through intrinsics). Apart from intrinsics, it is treated as a storage-only type.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120395/new/

https://reviews.llvm.org/D120395