[PATCH] D41651: AMDGPU: Add 32-bit constant address space

Thu Jan 25 11:08:12 PST 2018

mareko added a comment.

In https://reviews.llvm.org/D41651#986619, @nhaehnle wrote:

> This needs documentation in AMDGPUUsage.rst.
>
> Relying on metadata for correctness is indeed not okay. We should either say that CONSTANT_ADDRESS_32BIT just assumes uniformness, and move the address to an SGPR (via v_readfirstlane) if required, *or* support this also with VMEM instructions.

Here is why relying on metadata is OK.

The behavior of 64-bit pointers:

- If the address is in VGPRs and amdgpu.uniform is not dropped, you'll get readfirstlane and correct behavior.
- If the address is in VGPRs and amdgpu.uniform is dropped by a random pass, you'll get SMEM opcodes reading descriptors from VGPRs, so you'll get an invalid binary without an error and a GPU hang.

The behavior for 32-bit pointers:

- If the address is in VGPRs and amdgpu.uniform is not dropped, you'll get readfirstlane and correct behavior.
- If the address is in VGPRs and amdgpu.uniform is dropped by a random pass, you'll get a compile error.

Therefore, 32-bit pointers are a significant improvement in compiler behavior over 64-bit pointers. The current implementation covers everything Mesa will ever need. 32-bit pointers in VMEM opcodes would be a bonus, but it would also be useless for Mesa.

> As far as I understand, the point of this change is to use 32-bit pointers for descriptor tables. It doesn't seem too far-fetched that we'll eventually have to supported extensions with divergent resource descriptors, so I vaguely prefer the second solution.

Game developers will be advised to use the readfirstlane intrinsic in a loop, as has happened in the past. As long as AMD doesn't support divergent resource descriptors in other drivers, we are fine.

> The other question is, why do we need a new address space at all? Can't we synthesize an appropriate pointer via inttoptr casts? I believe this is what SCPC is doing.

The short story is: We should never use inttoptr if InstCombine can't remove it. inttoptr is unoptimizable by LLVM.

https://reviews.llvm.org/D41651