[llvm] [AMDGPU] Lower `llvm.amdgcn.queue.ptr` instrinsic to using implicit kernel argument if feasible (PR #103490)

Shilei Tian via llvm-commits llvm-commits at lists.llvm.org
Fri Aug 16 14:20:51 PDT 2024


shiltian wrote:

Let me try to make things clear here. The current situation (without this patch) is:

- The intrinsic is lowered to read `s[6:7]`, no matter what GFX and COV. This assumes `s[6:7]` has the value, which means the kernel descriptor needs to be set, no matter what GFX and COV.
- @changpeng's previous patches removed the setup of the kernel descriptor for COV5+. This means if the builtin/intrinsic is used, it is completely broken right now.

Given this, there are two approaches to fix this.

1. Change the lowering of the intrinsic to make it align with @changpeng's patches, that lower the intrinsic to reading from implicit kernel argument. This is what this patch is trying to do.
2. No matter what COV is, `s[6:7]` needs to be set, as long as the builtin/intrinsic is used (explicitly or implicitly). This is what @arsenm is proposing.

Let's compare the two approaches.

For 1, we can safely assume `s[6:7]` will never need to be set up for queue pointer for COV5+, no matter of the existence of the function attribute `amdgpu-no-queue-ptr`, which means no matter whether the optimization is on or off, and no matter whether the optimization can add the function attribute (such as an indirect function call that can't be resolved at compile time), we always have the two SGPRs available for other use.

For 2, we need to revert some changes by @changpeng that set the kernel descriptor if the intrinsic is used explicitly or implicitly. By "implicitly" I mean when at compile time we can't resolve an indirect function call such that we need to assume the intrinsic might be used. This means, even for COV5+, if the optimization is off, or the optimization can't work as expected, we lose the two SGPRs, no matter whether they are really used or not.

IMHO, I don't see the win of 2. It might follow the "convention" that an intrinsic needs to read from registers, but we have so many intrinsics that don't read from register. The reason it read from register in the past is there is no other way.

Did I misunderstand anything here? @arsenm @kerbowa @changpeng 

https://github.com/llvm/llvm-project/pull/103490


More information about the llvm-commits mailing list