[PATCH] D17042: AMDGPU/SI: move SMRD to flat if UseFlatForGlobal is true
Tom Stellard via llvm-commits
llvm-commits at lists.llvm.org
Wed Feb 10 11:04:12 PST 2016
tstellarAMD added inline comments.
================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:2327-2341
@@ +2326,17 @@
+
+ case AMDGPU::BUFFER_LOAD_DWORDX2_OFFSET: return AMDGPU::FLAT_LOAD_DWORDX2;
+ case AMDGPU::BUFFER_LOAD_DWORDX4_OFFSET: return AMDGPU::FLAT_LOAD_DWORDX4;
+ case AMDGPU::BUFFER_LOAD_DWORD_OFFSET: return AMDGPU::FLAT_LOAD_DWORD;
+ case AMDGPU::BUFFER_LOAD_SBYTE_OFFSET: return AMDGPU::FLAT_LOAD_SBYTE;
+ case AMDGPU::BUFFER_LOAD_SSHORT_OFFSET: return AMDGPU::FLAT_LOAD_SSHORT;
+ case AMDGPU::BUFFER_LOAD_UBYTE_OFFSET: return AMDGPU::FLAT_LOAD_UBYTE;
+ case AMDGPU::BUFFER_LOAD_USHORT_OFFSET: return AMDGPU::FLAT_LOAD_USHORT;
+
+ case AMDGPU::BUFFER_LOAD_DWORDX2_ADDR64: return AMDGPU::FLAT_LOAD_DWORDX2;
+ case AMDGPU::BUFFER_LOAD_DWORDX4_ADDR64: return AMDGPU::FLAT_LOAD_DWORDX4;
+ case AMDGPU::BUFFER_LOAD_DWORD_ADDR64: return AMDGPU::FLAT_LOAD_DWORD;
+ case AMDGPU::BUFFER_LOAD_SBYTE_ADDR64: return AMDGPU::FLAT_LOAD_SBYTE;
+ case AMDGPU::BUFFER_LOAD_SSHORT_ADDR64: return AMDGPU::FLAT_LOAD_SSHORT;
+ case AMDGPU::BUFFER_LOAD_UBYTE_ADDR64: return AMDGPU::FLAT_LOAD_UBYTE;
+ case AMDGPU::BUFFER_LOAD_USHORT_ADDR64: return AMDGPU::FLAT_LOAD_USHORT;
+
----------------
It would be really great if we could use the InstrMappings in Tablegen to generate this, but I guess it is a pretty small table.
You should simplify it though, since we have multiple case statements returning the same value.
================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:2366
@@ +2365,3 @@
+
+ if (ST.useFlatForGlobal()) {
+ unsigned FlatOpcode = getFlatInst(NewOpcode);
----------------
We also want to do this on VI when the original opcode is an addr64..
================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:2381
@@ +2380,3 @@
+
+ if (MI->getOperand(2).isReg()) {
+ RegOffset = MI->getOperand(2).getReg();
----------------
You should use getNamedOperand here instead of indexing by number.
================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:2386
@@ +2385,3 @@
+ assert(MI->getOperand(2).isImm());
+ ImmOffset = MI->getOperand(2).getImm();
+ HasOffset = (ImmOffset != 0);
----------------
Same here and a few other places.
================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:2410-2412
@@ +2409,5 @@
+ BuildMI(*MBB, MI, DL, get(AMDGPU::V_ADD_I32_e64), VAddrLo)
+ .addReg(AMDGPU::VCC, RegState::Define)
+ .addReg(SBase.getReg(), 0, AMDGPU::sub0)
+ .addImm(ImmOffset);
+ }
----------------
On SI/CI the immediate offset is in dowrds, but on VI it is in bytes, so I think this calculation will be wrong on SI/CI.
I recommend creating a helper function to extract the: base pointer, reg offset and/or immediate offset from the SMRD instruction. This way you could share code from the non-floatForGlobal path, which already handles the immediate conversion correctly.
================
Comment at: test/CodeGen/AMDGPU/salu-to-valu.ll:55-60
@@ -54,7 +54,7 @@
; GCN-LABEL: {{^}}smrd_valu:
-; FIXME: We should be using flat load for HSA.
-; GCN: buffer_load_dword [[OUT:v[0-9]+]]
+; GCN-NOHSA: buffer_load_dword [[OUT:v[0-9]+]]
+; GCN-HSA: flat_load_dword [[OUT:v[0-9]+]]
; GCN-NOHSA: buffer_store_dword [[OUT]]
; GCN-HSA: flat_store_dword [[OUT]]
define void @smrd_valu(i32 addrspace(2)* addrspace(1)* %in, i32 %a, i32 %b, i32 addrspace(1)* %out) #1 {
----------------
I think you should add more test cases testing the edge cases for the immediate offsets.
http://reviews.llvm.org/D17042
More information about the llvm-commits
mailing list