[PATCH] D158059: [AMDGPU/wmma] - Disable 3-address syntax for f16

Mon Sep 18 07:53:59 PDT 2023

piotr added a comment.

> Then, as you say, our register allocation needs to be intelligent enough to keep the matrices packed.
> How would you define the instructions for this to work?

Unfortunately, looking at that a bit more I don't think the scheme I proposed is feasible. Even if we add some extra copies to preserve the other half, the twoaddressinstruction pass will not be able to understand that.

The only alternative I could suggest instead of adding new intrinsics seems to be to implement the packing entirely in the codegen (e.g. after twoaddressinstruction pass).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158059/new/

https://reviews.llvm.org/D158059