[PATCH] D61494: AMDGPU: Write LDS objects out as global symbols in code generation

Wed May 8 05:46:28 PDT 2019

arsenm added inline comments.

================
Comment at: lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:310
+    const DataLayout &DL = GV->getParent()->getDataLayout();
+    uint64_t Size = DL.getTypeAllocSize(GV->getType()->getElementType());
+    unsigned Align = GV->getAlignment();
----------------
This uses the deprecated pointer type method. This should use getValueType() on the global instead

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:4599-4603
+    SDValue GA = DAG.getTargetGlobalAddress(GV, DL, MVT::i32, GSD->getOffset(),
+                                            SIInstrInfo::MO_ABS32_LO);
+    GA = DAG.getNode(ISD::AssertZext, DL, MVT::i32, GA,
+                     DAG.getValueType(MVT::i16));
+    return {DAG.getMachineNode(AMDGPU::S_MOV_B32, DL, MVT::i32, GA), 0};
----------------
Shouldn't introduce machine nodes here. What is the problem with selecting the GlobalAddress directly in a tablegen pattern or manually in AMDGPUISelDAGToDAG?

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:5571-5572
+    Triple::OSType OS = getTargetMachine().getTargetTriple().getOS();
+    if (OS == Triple::AMDHSA || OS == Triple::AMDPAL)
+      return Op;
+
----------------
Maybe we can define the intrinsic to return -1 if the size can't be determined, similar to how llvm.objectsize works?

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:9994-10007
+  if (Op->isMachineOpcode()) {
+    switch (Op->getMachineOpcode()) {
+    case AMDGPU::S_MOV_B32:
+    case AMDGPU::V_MOV_B32_e32:
+      Known = DAG.computeKnownBits(Op->getOperand(0), DemandedElts, Depth + 1);
+      break;
+    default:
----------------
I don't think there should be a need to handle machine nodes here, particularly moves. The GlobalAddress should be legal, and the select pattern should be inserting the move?

================
Comment at: test/CodeGen/AMDGPU/constant-fold-mi-operands.ll:4-6
-; GCN-LABEL: {{^}}fold_mi_v_and_0:
-; GCN: v_mov_b32_e32 [[RESULT:v[0-9]+]], 0{{$}}
-; GCN-NOT: [[RESULT]]
----------------
I'm not sure these tests have been entirely replaced with MIR yet. Can you just set the triple to preserve them?

================
Comment at: test/CodeGen/AMDGPU/sopk-compares.ll:4-8
-; Since this intrinsic is exposed as a constant after isel, use it to
-; defeat the DAG's compare with constant canonicalizations.
-declare i32 @llvm.amdgcn.groupstaticsize() #1
-
- at lds = addrspace(3) global [512 x i32] undef, align 4
----------------
I'm 99% sure we don't have replacements for any of these tests either. The point of these isn't to use groupstaticsize, but to get some constant that isn't visible during selection. These shouldn't be removed

================
Comment at: test/CodeGen/AMDGPU/sopk-compares.ll:367-369
-; GCN-LABEL: {{^}}br_scc_ule_i32:
-; GCN: s_cmpk_le_u32 s{{[0-9]+}}, 0x800{{$}}
-define amdgpu_kernel void @br_scc_ule_i32(i32 %cond, i32 addrspace(1)* %out) #0 {
----------------
Ditto

================
Comment at: test/CodeGen/AMDGPU/sub.i16.ll:147

- at lds = addrspace(3) global [512 x i32] undef, align 4
-
----------------
Ditto

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61494/new/

https://reviews.llvm.org/D61494