[PATCH] D158059: [AMDGPU/wmma] - Disable 3-address syntax for f16

Tue Sep 12 00:24:28 PDT 2023

piotr added a comment.

Thanks for the extra info; I understand the problem now: currently there seems to be no way to take advantage of the opsel bit to reuse the same destination matrix registers for two wmma instructions.

One way to fix that would be as proposed here, at the intrinsic level. In this approach the intrinsic would always take the full register (C, D matrices), operate on the specified half, and preserve the other half.

But maybe this can be fixed in the codegen without adding the new intrinsic? I think the underlying problem is that we are not modelling correctly the fact that these opsel instructions do not touch the other half. If we could do that I would expect the two address pass to re-use the dest regs of the first instruction, because the low halves would still be live at the end of the second instruction.

%combined = call <8 x float> @combine_halfs(<8 x float> %lo_16_bit_matrix, <8 x float> %hi_16_bit_matrix)
%low_half_multiplied = call <8 x float> @wmma(%a, %b, <8 x float> %combined, i1 false /* low half */)
%high_half_multiplied = call <8 x float> @wmma(%a, %b, <8 x float> %combined, i1 true /* high half */)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158059/new/

https://reviews.llvm.org/D158059