[PATCH] D70118: [AMDGPU] Lower llvm.amdgcn.s.buffer.load.v3[i|f]32

Wed Nov 13 03:29:12 PST 2019

piotr marked 2 inline comments as done.
piotr added inline comments.

================
Comment at: llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.buffer.load.ll:27-31
+;GCN-LABEL: {{^}}s_buffer_load_index_divergent:
+;GCN-NOT: s_waitcnt;
+;GCN: buffer_load_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+:[0-9]+}}], 0 offen
+define amdgpu_ps void @s_buffer_load_index_divergent(<4 x i32> inreg %desc, i32 %index) {
+main_body:
----------------
arsenm wrote:
> Most of these test changes look unrelated?
I added the v3 test which exercises the code I am modifying (divergent index): s_buffer_loadx3_index_divergent. Also added analogous s_buffer_load_index_divergent and s_buffer_loadx2_index_divergent for consistency.

================
Comment at: llvm/test/CodeGen/AMDGPU/llvm.amdgcn.s.buffer.load.ll:97-98
+;GCN-NOT: s_waitcnt;
+;GCN: s_buffer_load_dword s{{[0-9]+}}, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}}
+;GCN: s_buffer_load_dwordx2 s[{{[0-9]+:[0-9]+}}], s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}}
+define amdgpu_ps void @s_buffer_loadx3_index(<4 x i32> inreg %desc, i32 inreg %index) {
----------------
arsenm wrote:
> There is no load dwordx3, so I'm slightly confused about why you need this, but I would expect this ot widen to 4x loads?
The big picture is that I am working on cutting down the number of loaded components with various buffer loads. I have another change in instcombine (soon to be uploaded for review) that trims loads based on the components used. With that patch vec3 s_buffer_load crashes in the lowering so I am adding support for that. 

It is useful to have s_buffer_load.v3 for the case with divergent index, where s_buffer_load cannot be used and buffer_load_dword is generated instead. On newer GPU (VI and later) buffer_load_dwordx3 is present, only on SI we generate buffer_load_dwordx4 for that (see s_buffer_loadx3_index_divergent test). 

As for whether it is better to split or widen the s_buffer_load (non-divergent index), the advantage of splitting is that the split loads can be merged with an adjacent load more easily. But I do not have a strong opinion on that.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70118/new/

https://reviews.llvm.org/D70118