[PATCH] D125060: [amdgpu][wip] Implement lds kernel id intrinsic

Thu May 26 09:43:11 PDT 2022

JonChesterfield added a comment.

Revision just uploaded annotates all kernels with an integer. That doesn't need to land with the intrinsic but it makes it much easier to drive runtime tests through this code so it's there for now.

In D125060#3496748 <https://reviews.llvm.org/D125060#3496748>, @arsenm wrote:

> I have to look closer but my main concern is about adding an SGPR argument for this. It doesn’t correspond to a real kernel input, and we didn’t add this to the new ABI register layout proposal. What happened to using some kind of relocation for the kernel ID?

I need to amend the new ABI layout page internally, and possibly pick a different number to burn. Relocation/loader patch doesn't work for this case as a given function may be called from multiple different kernels. In the happy case where a function is only callable from one kernel, the intrinsic should be constant folded (todo self, test/implement that) and still doesn't want a relocation.

In D125060#3497044 <https://reviews.llvm.org/D125060#3497044>, @rampitec wrote:

> As far as I understand this requires a whole program compilation and will not work with late linking?

Depends what you mean by linking really. Because this wires the variable accesses to an internal-linkage table lookup in IR, renaming that table when combining IR later on actually works fine. Each kernel will have an assigned index that makes sense in the context of the only table it can see.

The module.lds at address zero trick will break if it's renamed on linking, so that'll need to become something more sophisticated than a string comparison. Also the kernel lds struct named after the current function will break if the kernel is renamed and the struct not. So there's a few edges around incrementally lowering LDS (e.g. allowing calling the lowering IR pass repeatedly) that should be patched up.

If you mean a kernel compiled to ISA calling a function in some other module compiled to ISA which uses LDS, that doesn't work as written. The lookup and enumeration extends relatively easily - tag the table with appending linkage and provide a linker symbol for the start position for the current elf - but the allocation in the kernel doesn't. I'm not yet sure if there's a link time optimisation available or if we should treat calls between code objects and calls between isa in elf modules equivalently (via load time relocation).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125060/new/

https://reviews.llvm.org/D125060