[PATCH] D17042: AMDGPU/SI: move SMRD to flat if UseFlatForGlobal is true

Wed Feb 10 11:04:12 PST 2016

tstellarAMD added inline comments.

================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:2327-2341
@@ +2326,17 @@
+
+    case AMDGPU::BUFFER_LOAD_DWORDX2_OFFSET:    return AMDGPU::FLAT_LOAD_DWORDX2;
+    case AMDGPU::BUFFER_LOAD_DWORDX4_OFFSET:    return AMDGPU::FLAT_LOAD_DWORDX4;
+    case AMDGPU::BUFFER_LOAD_DWORD_OFFSET:      return AMDGPU::FLAT_LOAD_DWORD;
+    case AMDGPU::BUFFER_LOAD_SBYTE_OFFSET:      return AMDGPU::FLAT_LOAD_SBYTE;
+    case AMDGPU::BUFFER_LOAD_SSHORT_OFFSET:     return AMDGPU::FLAT_LOAD_SSHORT;
+    case AMDGPU::BUFFER_LOAD_UBYTE_OFFSET:      return AMDGPU::FLAT_LOAD_UBYTE;
+    case AMDGPU::BUFFER_LOAD_USHORT_OFFSET:     return AMDGPU::FLAT_LOAD_USHORT;
+
+    case AMDGPU::BUFFER_LOAD_DWORDX2_ADDR64:    return AMDGPU::FLAT_LOAD_DWORDX2;
+    case AMDGPU::BUFFER_LOAD_DWORDX4_ADDR64:    return AMDGPU::FLAT_LOAD_DWORDX4;
+    case AMDGPU::BUFFER_LOAD_DWORD_ADDR64:      return AMDGPU::FLAT_LOAD_DWORD;
+    case AMDGPU::BUFFER_LOAD_SBYTE_ADDR64:      return AMDGPU::FLAT_LOAD_SBYTE;
+    case AMDGPU::BUFFER_LOAD_SSHORT_ADDR64:     return AMDGPU::FLAT_LOAD_SSHORT;
+    case AMDGPU::BUFFER_LOAD_UBYTE_ADDR64:      return AMDGPU::FLAT_LOAD_UBYTE;
+    case AMDGPU::BUFFER_LOAD_USHORT_ADDR64:     return AMDGPU::FLAT_LOAD_USHORT;
+
----------------
It would be really great if we could use the InstrMappings in Tablegen to generate this, but I guess it is a pretty small table.  

You should simplify it though, since we have multiple case statements returning the same value.

================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:2366
@@ +2365,3 @@
+
+      if (ST.useFlatForGlobal()) {
+        unsigned FlatOpcode = getFlatInst(NewOpcode);
----------------
We also want to do this on VI when the original opcode is an addr64..

================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:2381
@@ +2380,3 @@
+
+        if (MI->getOperand(2).isReg()) {
+          RegOffset = MI->getOperand(2).getReg();
----------------
You should use getNamedOperand here instead of indexing by number.

================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:2386
@@ +2385,3 @@
+          assert(MI->getOperand(2).isImm());
+          ImmOffset = MI->getOperand(2).getImm();
+          HasOffset = (ImmOffset != 0);
----------------
Same here and a few other places.

================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:2410-2412
@@ +2409,5 @@
+            BuildMI(*MBB, MI, DL, get(AMDGPU::V_ADD_I32_e64), VAddrLo)
+               .addReg(AMDGPU::VCC, RegState::Define)
+               .addReg(SBase.getReg(), 0, AMDGPU::sub0)
+               .addImm(ImmOffset);
+          }
----------------
On SI/CI the immediate offset is in dowrds, but on VI it is in bytes, so I think this calculation will be wrong on SI/CI.

I recommend creating a helper function to extract the: base pointer, reg offset and/or immediate offset from the SMRD instruction.  This way you could share code from the non-floatForGlobal path, which already handles the immediate conversion correctly.

================
Comment at: test/CodeGen/AMDGPU/salu-to-valu.ll:55-60
@@ -54,7 +54,7 @@
 
 ; GCN-LABEL: {{^}}smrd_valu:
-; FIXME: We should be using flat load for HSA.
-; GCN: buffer_load_dword [[OUT:v[0-9]+]]
+; GCN-NOHSA: buffer_load_dword [[OUT:v[0-9]+]]
+; GCN-HSA: flat_load_dword [[OUT:v[0-9]+]]
 ; GCN-NOHSA: buffer_store_dword [[OUT]]
 ; GCN-HSA: flat_store_dword [[OUT]]
 define void @smrd_valu(i32 addrspace(2)* addrspace(1)* %in, i32 %a, i32 %b, i32 addrspace(1)* %out) #1 {
----------------
I think you should add more test cases testing the edge cases for the immediate offsets.


http://reviews.llvm.org/D17042