[PATCH] D120395: [X86] Prohibit arithmetic operations on type `__bfloat16`

Mon Mar 7 14:30:58 PST 2022

andrew.w.kaylor added a comment.

In D120395#3358533 <https://reviews.llvm.org/D120395#3358533>, @craig.topper wrote:

> __m256bh should not have been a new type. It should have been an alias of __m256i. We don't have load/store intrinsics for __m256bh so if you can even get the __m256bh type in and out of memory using load/store intrinsics, it is only because we allow lax vector conversion by default. -fno-lax-vector-conversions will probably break any code trying to load/store it using a load/store intrinsic. If __m256bh was made a struct as at one point proposed, this would have been broken.
>
> If we want __m256bh to be a unique type using __bf16, we must define load, store, and cast intrinsics for it. We would probably want insert/extract element intrinsics as well.

>From this, it sounds like our intrinsics support is incomplete for this type. Even if it is defined as an alias of some existing type (such as __m256i), I would have to do the mental gymnastics of mixing and matching intrinsics to get the behavior I want. I think this gets back to the question @scanon asked about the semantics of this type. What should I be able to do with it? Can I load a vector of these values from memory? Can I store them to memory? Can I assign one vector to another? Can I pass it as an argument? Can I use it as a return type? It looks the only things we have intrinsics for (for the vector types) are converting to and from single precision vectors and performing the dot product accumulate operation.

I was wondering about similar issues when I was looking at what we had for the __bfloat16 type. We have intrinsics to convert this type to and from single precision floating point values, but I can't do anything else with it. Nothing else at all, including inserting it into a vector of bf16 values.

So @pengfei is trying to press ahead with the backend implementation, but our front end support is incomplete. That might explain why Phoebe and I haven't been able to agree on what should be done here.

This patch is strictly a front end patch, but it's trying to just wedge definitions into header files to get the desired outcome in the code generation. From the user's perspective, it feels totally broken.

Consider this function.

  __mm128 f(__bfloat16 *p1, __bfloat16 *p2) {
    // Load the vectors using the integer load intrinsics??
    __mm128i temp1 = _mm_loadu_epi32(p1);
    __mm128i temp2 = _mm_loadu_epi32(p2);

    // Zero-initialize the a base value vector
    __mm128 base = _mm_set_ps1(0.0f);

    // Perform the dot product
    return _mm_dpbf16_ps (base, temp1, temp2);
  }

Is what you'd expect with the current definitions? It looks like it produces the instructions I expected, but with -fno-lax-vector-conversions I get an error unless I explicitly bitcast the arguments from `__m128i` to `__m128bh`.

I think that just brings me up to speed with what Craig was saying, right?

So at this point we have these options:

1. Make the `__m[128|256|512]bh` types aliases of `__m[128|256|512]i`
2. Deprecate the `__m[128|256|512]bh` types and replace them with `__m[128|256|512]i`
3. Add load/store/insert/extract intrinsics for the `__bfloat16` type

Of these, I'd prefer the third option because both of the first two require the an overloaded use of the vector-integer type. I already don't like that we use the same type for any size integer vector. Using it for BF16 vectors just seems wrong.

For the example above, I'd like to write code like this:

  __mm128 f(__bfloat16 *p1, __bfloat16 *p2) {
    // Load the BF16 vectors
    __mm128bh v1 = _mm_load_pbh(p1);
    __mm128bh v2 = _mm_load_pbh(p2);

    // Zero-initialize the a base value vector
    __mm128 base = _mm_set_ps1(0.0f);

    // Perform the dot product
    return _mm_dpbf16_ps (base, v1, v2);
  }

That's more work, but it has the merit of allowing me to use types that match what the program is doing.

The fact that you can't pass a __m128i value to a function that is expecting __m128bh is a good thing. We shouldn't be making changes that prevents this diagnostic.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120395/new/

https://reviews.llvm.org/D120395