[llvm] [AMDGPU] Run LowerLDS at the end of the fullLTO pipeline (PR #75333)

Mon Feb 12 06:30:27 PST 2024

JonChesterfield wrote:

> BTW what happened to the patches to allow module LDS to report to promote-alloca the available LDS budget?

I think the machinery landed and the promote alloca patch got lost somewhere around the phab -> github transition. There are some limitations around there that could be stamped out, e.g. there's no particular reason the alloca->lds translation has to be restricted to kernels. I'll push it back onto the work list.

> How difficult would it be to support absolute addresses in the pass?

The LDS lowering pass (currently) crawls the IR module, writes "allocates N bytes" on kernels and absolute address metadata on variables. It's partly set up like that in order to support application developers writing absolute address constraints on globals as that seems generally useful and someone thought it was a plausible feature request in one of the meetings.

As far as allocating goes, making it composable means treating the kernel as a bump allocator. That's definitely fine.

There are some sketchy semantics and codgen implications that falls out.

I think the conclusion of the design choices will be that a variable with an absolute address cannot be moved, kernels will not try to allocate extra memory to encompass it, data layout will be generally compromised relative to the single application case.

Given two modules which have independently had LDS lowering applied, llvm-link of the two is generally going to be unsound, for the same reasons that machine code linking of independently lowered LDS is not viable.

So while "handle absolute addresses" is relatively straightforward from the perspective of the LDS lowering pass, it's going to be interpreted as meaning LDS lowering can be arbitrarily composed as opposed to the very limited situations in which it's actually valid. In particular, a cleverer LDS pass will make the current error message go away, and do something entirely locally correct, and the emergent behaviour will be broken nonsense as soon as someone steps slightly away from the thinlto workflow, probably without any diagnostics.

I would propose the following:

1. Check for any LDS variables with absolute addresses.
2. If all LDS variables have absolute addresses, no work to do, return. Success.
3. If none have absolute addresses, run the pass. Add a postcondition that ever LDS variable has an absolute address if it's not already there.
4. If some have absolute addresses, and some do not, fatal_error with some message along the lines of "no deal, this stuff doesn't compose properly".

Other interesting ideas are to refuse to compile code containing external LDS variables, which is currently left to the backend to report iirc.

The right thing is to change the LDS lowering scheme such that it does work in the presence of machine code linking. That is difficult to do and probably involves patching the linker and/or the loaders. However that will then work however code is spliced together, modulo kernels refusing to launch because there's only 64k or so of LDS available and machine code linking is going to compromise data layout efficiency badly.

The above suggestion is better than trying to pass state around to disable subsequent calls to the pass. If you want to do the magic implicit disabling instead, do it in the pass pipeline control, not in the pass itself.

The thinlto workflow will hit case 3. on each piece, then hit case 2. if I'm understanding correctly. Hopefully it doesn't duplicate global variables while splitting the module as that definitely won't work.

https://github.com/llvm/llvm-project/pull/75333