[PATCH] D153761: [AMDGPU] ISel for @llvm.amdgcn.cs.chain intrinsic (WIP)

Fri Jun 30 07:26:16 PDT 2023

foad added a comment.

Looks pretty good to me overall. I don't know enough about call lowering to offer an opinion on the alternative ways of implementing this that you have been discussing with @arsenm.

> The @llvm.amdgcn.cs.chain intrinsic is essentially a call.

It's a tail call, isn't it? Does it have the same restriction as an LLVM IR tail call, that allocas in the caller cannot be accessed by the callee?

================
Comment at: llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:7392
+
+    for (unsigned i : {2, 3}) {
+      TargetLowering::ArgListEntry Arg;
----------------
Capitalize variable names.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:7401
+    Args[0].IsInReg = true; // Make sure these go in the SGPRs.
+    assert(!Args[1].IsInReg && "VGPR args should not be marked inreg");
+
----------------
Is this something you could check in the IR verifier? Failing an assertion here is not a nice diagnostic.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:7404
+    // We're also going to pass the EXEC mask as the last argument, but it will
+    // be handled differently from the other args.
+    TargetLowering::ArgListEntry Arg;
----------------
Not sure what "it will be handled differently" is telling me?

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp:963
+  // For calls to amdgpu_cs_chain functions, the address is known to be uniform.
+  assert((AMDGPU::isChainCC(CC) || !(IsIndirect && IsTailCall)) &&
+         "Indirect calls can't be tail calls, "
----------------

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp:1304-1305
+
+    if ((IsW32 && !ExecArg.Ty->isIntegerTy(32)) ||
+        (!IsW32 && !ExecArg.Ty->isIntegerTy(64)))
+      return false;
----------------

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp:1309
+    unsigned MovOpc = ST.isWave32() ? AMDGPU::S_MOV_B32 : AMDGPU::S_MOV_B64;
+    MCRegister Exec = ST.isWave32() ? AMDGPU::EXEC_LO : AMDGPU::EXEC;
+    auto SetExec =
----------------
This is `TRI->getExec()`

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:3174-3175
   SelectionDAG &DAG = CLI.DAG;
+  MachineFunction &MF = DAG.getMachineFunction();
+  bool IsWave32 = MF.getSubtarget<GCNSubtarget>().isWave32();
+
----------------
Should not need these lines. We already have access to `Subtarget`.

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:3185-3186
+
+    if ((IsWave32 && !RequestedExec.Ty->isIntegerTy(32)) ||
+        (!IsWave32 && !RequestedExec.Ty->isIntegerTy(64)))
+      return lowerUnhandledCall(CLI, InVals, "Invalid value for EXEC");
----------------

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:3430
+    // Set EXEC right before the call.
+    MCRegister ExecReg = IsWave32 ? AMDGPU::EXEC_LO : AMDGPU::EXEC;
+    Chain = DAG.getCopyToReg(Chain, DL, ExecReg, RequestedExec.Node, InGlue);
----------------
`TRI->getExec()`, if you move the definition of `TRI` up from 33 lines below.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153761/new/

https://reviews.llvm.org/D153761