[llvm] [x86] Enable indirect tail calls with more arguments (PR #137643)
Hans Wennborg via llvm-commits
llvm-commits at lists.llvm.org
Wed Apr 30 06:12:13 PDT 2025
================
@@ -1353,6 +1376,22 @@ void X86DAGToDAGISel::PreprocessISelDAG() {
(N->getOpcode() == X86ISD::TC_RETURN &&
(Subtarget->is64Bit() ||
!getTargetMachine().isPositionIndependent())))) {
+
+ if (N->getOpcode() == X86ISD::TC_RETURN) {
+ // There needs to be enough non-callee-saved GPRs available to compute
+ // the load address if folded into the tailcall. See how the
+ // X86tcret_6regs and X86tcret_1reg classes are used and defined.
+ unsigned NumRegs = 0;
+ for (unsigned I = 3, E = N->getNumOperands(); I != E; ++I) {
+ if (isa<RegisterSDNode>(N->getOperand(I)))
+ ++NumRegs;
+ }
+ if (!Subtarget->is64Bit() && NumRegs > 1)
+ continue;
+ if (NumRegs > 6)
----------------
zmodem wrote:
You're right, Win64 has one less. I stole my code from the `X86tcret_6regs` fragment: https://github.com/llvm/llvm-project/blob/e58d227b09d533e2df644f827cedff8e206e0bfc/llvm/lib/Target/X86/X86InstrFragments.td#L676-L684
which is what the folding pattern for `TCRETURNmi64` uses: https://github.com/llvm/llvm-project/blob/e58d227b09d533e2df644f827cedff8e206e0bfc/llvm/lib/Target/X86/X86InstrCompiler.td#L1345-L1349
So that seems wrong for Win64.
I think the source of truth here is the register class which the folded instruction actually uses, which is `ptr_rc_tailcall` that gets defined by `X86RegisterInfo::getGPRsForTailCall`: https://github.com/llvm/llvm-project/blob/a2c1ff10eb930dd56be306dc0818d6ff31fff546/llvm/lib/Target/X86/X86RegisterInfo.cpp#L227-L239
That one seems to handle Win64 correctly, and also takes the calling convention into account in general.
---
So I think `X86tcret_6regs` should not hard-code 6, but check the `ptr_rc_tailcall` register class, and we should extract the code into a function that we can also use when moving the load.
And we should do the same for `X86tcret_1reg`, which is similar but has some differences:
https://github.com/llvm/llvm-project/blob/e58d227b09d533e2df644f827cedff8e206e0bfc/llvm/lib/Target/X86/X86InstrFragments.td#L686-L699
It's checking whether the load uses a frame slot or a global, in which case it figures that doesn't use up any extra registers. I'm not 100% convinced that's true for the global case? And shouldn't we do the same check in `X86tcret_6regs`?
https://github.com/llvm/llvm-project/pull/137643
More information about the llvm-commits
mailing list