[PATCH] D41651: AMDGPU: Add 32-bit constant address space
Marek Olšák via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Jan 25 11:08:12 PST 2018
mareko added a comment.
In https://reviews.llvm.org/D41651#986619, @nhaehnle wrote:
> This needs documentation in AMDGPUUsage.rst.
>
> Relying on metadata for correctness is indeed not okay. We should either say that CONSTANT_ADDRESS_32BIT just assumes uniformness, and move the address to an SGPR (via v_readfirstlane) if required, *or* support this also with VMEM instructions.
Here is why relying on metadata is OK.
The behavior of 64-bit pointers:
- If the address is in VGPRs and amdgpu.uniform is not dropped, you'll get readfirstlane and correct behavior.
- If the address is in VGPRs and amdgpu.uniform is dropped by a random pass, you'll get SMEM opcodes reading descriptors from VGPRs, so you'll get an invalid binary without an error and a GPU hang.
The behavior for 32-bit pointers:
- If the address is in VGPRs and amdgpu.uniform is not dropped, you'll get readfirstlane and correct behavior.
- If the address is in VGPRs and amdgpu.uniform is dropped by a random pass, you'll get a compile error.
Therefore, 32-bit pointers are a significant improvement in compiler behavior over 64-bit pointers. The current implementation covers everything Mesa will ever need. 32-bit pointers in VMEM opcodes would be a bonus, but it would also be useless for Mesa.
> As far as I understand, the point of this change is to use 32-bit pointers for descriptor tables. It doesn't seem too far-fetched that we'll eventually have to supported extensions with divergent resource descriptors, so I vaguely prefer the second solution.
Game developers will be advised to use the readfirstlane intrinsic in a loop, as has happened in the past. As long as AMD doesn't support divergent resource descriptors in other drivers, we are fine.
> The other question is, why do we need a new address space at all? Can't we synthesize an appropriate pointer via inttoptr casts? I believe this is what SCPC is doing.
The short story is: We should never use inttoptr if InstCombine can't remove it. inttoptr is unoptimizable by LLVM.
https://reviews.llvm.org/D41651
More information about the llvm-commits
mailing list