[PATCH] D101405: [AMDGPU] Change FLAT SADDR to VADDR form in moveToVALU

Matt Arsenault via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Apr 30 16:22:33 PDT 2021


arsenm accepted this revision.
arsenm added a comment.
This revision is now accepted and ready to land.

LGTM with test comment



================
Comment at: llvm/test/CodeGen/AMDGPU/global-load-saddr-to-vaddr.ll:2
+; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s | FileCheck -check-prefixes=GCN %s
+
+; GCN-LABEL: {{^}}test_move_load_address_to_vgpr:
----------------
Add a comment explaining what this is testing


================
Comment at: llvm/test/CodeGen/AMDGPU/global-load-saddr-to-vaddr.ll:21-23
+  %i8 = add nuw nsw i32 %i, 1
+  %i9 = icmp eq i32 %i8, 256
+  br i1 %i9, label %bb2, label %bb3
----------------
Oh, this is the case where the loop index/pointer is really uniform, but forced to be a vector branch. It's just better to do the VALU address computation than do it scalar and copy with readfirstlane


================
Comment at: llvm/test/CodeGen/AMDGPU/move-load-addr-to-valu.mir:35
+      %3:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
+      %4:vgpr_32 = GLOBAL_LOAD_DWORD_SADDR %1, %3, 0, 0, implicit $exec
+      %2:sreg_64 = S_AND_B64 %1, 1, implicit-def $scc
----------------
rampitec wrote:
> foad wrote:
> > rampitec wrote:
> > > arsenm wrote:
> > > > Can you also add an IR test where this matters? I'm not understanding why the DAG divergence analysis would let this happen
> > > The test case shall be more than 100 instructions long, this is the threshold for memory dependency analysis to give up on the "noclobber" check. The divergence analysis tells it is ok, it is uniform. But noclobber is not set, so the address registers are not known not to be clobberd and we are here.
> > > 
> > > It really happens in large basic blocks. Probably it can also happen if we just had no SALU instructions to do some computations, so ended up with VALU which was pripagated, but that is not the case I have started with. I.e. it is really uniform, and readfirstlane is valid, but we don't want to issue it.
> > > 
> > > I am not really sure we want a test of that size. It will be more than obscure in the first place.
> > You could test with -memdep-block-scan-limit=1 ?
> I will check if I can produce a reasonable IR test.
Whether or not this can be converted to SMRD is a different problem


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D101405/new/

https://reviews.llvm.org/D101405



More information about the llvm-commits mailing list