[PATCH] D101591: [AMDGPU] Improve global SADDR selection

Thu Apr 29 18:11:41 PDT 2021

rampitec added inline comments.

================
Comment at: llvm/test/CodeGen/AMDGPU/global-saddr-load.ll:88-91
+; GFX10-NEXT:    v_mov_b32_e32 v0, 0
+; GFX10-NEXT:    s_add_u32 s0, s2, 0xfffff000
+; GFX10-NEXT:    s_addc_u32 s1, s3, -1
+; GFX10-NEXT:    global_load_ubyte v0, v0, s[0:1]
----------------
arsenm wrote:
> rampitec wrote:
> > arsenm wrote:
> > > This is more instructions
> > It is one materialized zero, I think it is more or less degenarate case. In a real world there is usually a vgpr with zero. On the other hand this is one VGPR less.
> This is the same pattern in all of the cases with a large immediate offset split. I think this is reasonably common and I would lean towards fewer instructions here
We still need to materialize zero, but I got the idea, will check what is possible to do here.

One caveat with non zero vaddr is it will be impossible to switch to vaddr form if calculations will end up in sgprs (and they likely will in these cases) but for any reason we would need to move instruction to VALU which in turn will result in readfirstlane instructions for the address, like it is now.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D101591/new/

https://reviews.llvm.org/D101591