[PATCH] D99152: [AMX] Prototype for vector and amx bitcast.

Wed Mar 24 07:44:24 PDT 2021

fhahn added a comment.

> Front-end alloca <256 x i32> for the local variable tile. When the return value of __builtin_ia32_tileloadd64_internal is assigned to tile. Front-end bitcast x86_amx to <256 x i32>. The x86_amx is the type returned from __builtin_ia32_tileloadd64_internal.

Can you share a more interesting example, where the result of the load is actually used by a different AMX builtin? For the store example, it seems like conversion intrinsic + regular IR store should work.

>> With respect to the `load` issue, it is not clear to me at the moment under which circumstances regular `load` instructions are generated & interact with AMX. If `load` is used to load `x` consecutive elements, than that's fine. But if the actual intended operation is a strided load, then `load` should not be used (this has also been discussed on llvm-dev).
>
> The `load` instructions are generated because it is a vector in C language. See https://gcc.godbolt.org/z/qv5jnjK48. If we use -O0, there is load instruction generated. If we use -O2, the load instruction is eliminated. The -O2 version is what we want. There is no <256 x i32> in the generated code.

I can't see any `load <256 x i32>` in the linked example, just a store. Could you check the example?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99152/new/

https://reviews.llvm.org/D99152