[PATCH] D120265: AMDGPU: Use the implicit kernargs for code object version 5

Mon Feb 21 17:40:43 PST 2022

arsenm added inline comments.

================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:16252-16254
+  const unsigned XOffset = IsCOV_5 ? 12 : 4;
+  auto *DP =
+      IsCOV_5 ? EmitAMDGPUImplicitArgPtr(CGF) : EmitAMDGPUDispatchPtr(CGF);
----------------
Given that it's an offset from a different base, I think it would be cleaner to just branch around the two cases

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp:410-412
+      // We are going to use the implicit kernarg for V5.
+      if (AMDGPU::getAmdhsaCodeObjectVersion() == 5)
+        removeAssumedBits(IMPLICIT_ARG_PTR);
----------------
This isn't covered by any test changes

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h:326-328
+    PRIVATE_BASE,
+    SHARED_BASE,
+    QUEUE_PTR,
----------------
You shouldn't merge these into the same enum. This enum should be renamed, this is for a different clover ABI

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:4876
+    Register LoadAddr;
+    B.materializePtrAdd(LoadAddr, KernargPtrReg, LLT::scalar(64), Offset);
+    // Load address
----------------
You're repeating this long sequence to get the queue pointer in two places, should common these into a function to get the queue pointer. Alternatively, emit the intrinsic and move this expansion into a lowering of the queue pointer intrinsic

================
Comment at: llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp:155

+unsigned getAmdhsaCodeObjectVersion() { return AmdhsaCodeObjectVersion; }
+
----------------
The code object version should probably come from the IR, not a global opt

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120265/new/

https://reviews.llvm.org/D120265