[PATCH] D125602: [X86][AMX][fastalloc] Allocate tile register separately.

Wed Jun 1 04:01:48 PDT 2022

LuoYuanke added a comment.

In D125602#3547595 <https://reviews.llvm.org/D125602#3547595>, @MatzeB wrote:

> There's a lot going on here.
>
> - Could you extract the `ShouldAllocClass` fixes for fastregalloc into a separate diff so we can get discuss them separately and get feedback from AMDGPU folks who introduced this and are the major other user of this AFAIK.

Thank you for review. Sure, I'll extract `ShouldAllocClass` fixes for fastregalloc into a separate patch.

> - I am still wrapping my head around tile registers and tile register configs; specifically I wonder if the support for that really needs to be integrated into the generic register allocation code or whether there is a way to materialize the register configurations in a post-pass. For example do you know how the legacy x86 x87-FPU support works, where we use the register allocator to allocate pseudo FP register fp0-fp7 and then use a post-pass in `X86FloatingPoint` to insert the necessary stack management operations after the fact. I am not saying it's the same problem, but it is an example of an instance where we managed to have the regalloc allocate to some intermediate pseudo registers and adapt to the complications (in that case register stacks) in a target-specific pass so we wouldn't need to introduce the concept of register stacks to the generic code.

We have pre-pass to insert config instructions and post-pass to fill the shape information of each physical tile register and feed it to config instruction.
I think they are different problem for tile register allocation and x87-FPU register allocation. The problem of x87-FPU register is that register is arranged in stack order in HW and I'm not sure stackify pass is very efficient to convert pseudo instruction to x87 instruction without inserting extra instructions to adjust stack order. The problem for tile registers is introduced by configure. Before accessing any tile register, they should be configured to specify the shape (row, column) information. HW would base on the shape information to operate AMX intruction. The configure instruction would clear all the data of tile registers. If we add a post-fixup pass like stackify pass, we need reconfig registers that is allocated to the same physical tile register but have different shape, and the reconfig would clobber all tile registers. That may generate too many config instruction and spill/reload in post-fix pass. Nevertheless I am happy to work with community for any suggestion to improve the solution for AMX register allocation.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125602/new/

https://reviews.llvm.org/D125602