[PATCH] D158059: [AMDGPU/wmma] - Disable 3-address syntax for f16

Mon Sep 18 09:17:07 PDT 2023

OutOfCache added a comment.

In D158059#4647410 <https://reviews.llvm.org/D158059#4647410>, @piotr wrote:

>> Then, as you say, our register allocation needs to be intelligent enough to keep the matrices packed.
>> How would you define the instructions for this to work?
>
> Unfortunately, looking at that a bit more I don't think the scheme I proposed is feasible. Even if we add some extra copies to preserve the other half, the twoaddressinstruction pass will not be able to understand that.
>
> The only alternative I could suggest instead of adding new intrinsics seems to be to implement the packing entirely in the codegen (e.g. after twoaddressinstruction pass).

Thank you for looking into it! I don't know if the packing is feasible at that stage, though. The packing affects the users of the matrices as well, and we base our solution heavily on finding the users of the lgc intrinisics. Plus, the changes can remove some intrinsic calls entirely, so the code size is reduced in any further pass. Delaying the changes to such a late stage makes the packing process more complicated and we would keep more code in each pass, which might be removed later on.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158059/new/

https://reviews.llvm.org/D158059