[llvm] [AMDGPU] ISel for @llvm.amdgcn.cs.chain intrinsic (PR #68186)
via llvm-commits
llvm-commits at lists.llvm.org
Wed Oct 25 00:55:23 PDT 2023
================
@@ -1283,6 +1293,23 @@ bool AMDGPUCallLowering::lowerTailCall(
MIRBuilder.buildInstr(AMDGPU::ADJCALLSTACKDOWN).addImm(NumBytes).addImm(0);
}
+ // If this is a chain call, we need to set EXEC right before the call.
+ if (AMDGPU::isChainCC(Info.CallConv)) {
+ ArgInfo ExecArg = Info.OrigArgs[1];
+ assert(ExecArg.Regs.size() == 1 && "Too many regs for EXEC");
+
+ if (!ExecArg.Ty->isIntegerTy(ST.getWavefrontSize()))
+ return false;
+
+ unsigned MovOpc = ST.isWave32() ? AMDGPU::S_MOV_B32 : AMDGPU::S_MOV_B64;
+ MCRegister Exec = TRI->getExec();
+ auto SetExec =
+ MIRBuilder.buildInstr(MovOpc).addDef(Exec).addReg(ExecArg.Regs[0]);
----------------
rovka wrote:
Good catch, I do have a test for this in one of the follow up patches and it's misbehaving. I'll refactor as a pseudo.
I'm not sure about the best place to expand it, is there any reason to go with `SILateBranchLowering` as opposed to `PostRAPseudos` or `AMDGPUMCInstLower`? I'm tempted to go with "as late as possible" because (IMO) it makes it easier to understand things and minimizes the output of `-print-changed`. `AMDGPUMCInstLower` is also where `SI_CALL`, `SI_TCRETURN` etc are expanded, so I'd go with that unless prompted otherwise.
https://github.com/llvm/llvm-project/pull/68186
More information about the llvm-commits
mailing list