[PATCH] D143945: [AMDGPU] Add legalization case for PTR_ADD on buffer pointers

Thu Feb 16 00:19:07 PST 2023

nhaehnle added a comment.

In D143945#4126397 <https://reviews.llvm.org/D143945#4126397>, @krzysz00 wrote:

> @piotr @nhaehnle @sheredom My proposal here - and in the patch stack in general, is that we do revisit that 160-bit idea.
>
> How I imagine this working is that, if you GEP a buffer descriptor, the operation that that lowers to - here `G_PTR_ADD` - does not modify the input buffer descriptor value at all. Instead, all these offsets we add are accumulated into the `voffset` and `imm` fields of the relevant `MUBUF` instruction.

Yes. In other words, the GEP doesn't modify the 128 bits of buffer descriptor that are in the pointer, but it does modify the 32 bits of offset that are in the pointer.

> That means that
>
>   %y = gep i32, ptr addrspace(7) %x, i32 1
>   %z = load i32, ptr addrspace(7) %y
>
> lowers to
>
>   [%z] = BUFFER_LOAD_DWORD [%x], offset:4

That doesn't quite work, though, because `%x` itself might already have an offset. It really needs to translate to something like:

  %y_offset = G_ADD %x_offset, 4
  %z = G_AMDGPU_BUFFER_LOAD_DWORD %y_offset, %x_buffer

which can then later split `%y_offset` into a voffset and an inst_offset part during instruction selection.

> Now, this does mean that, for example
>
>   %y = gep i32 ptr addrspace(7) %x, i32 1
>   %z = ptrtoint i128 %y
>
> produces ... who the heck knows what ... in `%z`, but buffer descriptors are a very weird kind of pointer and anyone emitting such code either knows exactly what they're doing or is Wrong. (My gut tells me that, if I _had_ to pick a behavior, `ptrtoint %y == ptrtoint %x`)

This problem is very easily solved by making the pointers into 160-bit quantities as initially intended.

Note that `@lgc.late.launder.fat.pointer` is not a full `inttoptr` for fat pointers. It so happens that in LLPC we never need an initial offset different from 0, and so that's why this lgc "intrinsic" doesn't have the offset part as input even though it sort of plays the role of `inttoptr`. But if you look at how the operation is lowered in our PatchBufferOp pass, you'll see clearly that it produces a 160-bit quantity: 128 bits of buffer descriptor + 32 bits of offset (which just happen to always be 0).

> Now, in the case that the buffer descriptor itself in non-uniform, I'm willing to initially declare that case a "instruction selection failed" case and perhaps add emulation in the future.

That's not acceptable for the graphics use cases.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D143945/new/

https://reviews.llvm.org/D143945