[llvm] [x86] Enable indirect tail calls with more arguments (PR #137643)

Hans Wennborg via llvm-commits llvm-commits at lists.llvm.org
Wed Apr 30 06:12:13 PDT 2025


================
@@ -1353,6 +1376,22 @@ void X86DAGToDAGISel::PreprocessISelDAG() {
          (N->getOpcode() == X86ISD::TC_RETURN &&
           (Subtarget->is64Bit() ||
            !getTargetMachine().isPositionIndependent())))) {
+
+      if (N->getOpcode() == X86ISD::TC_RETURN) {
+        // There needs to be enough non-callee-saved GPRs available to compute
+        // the load address if folded into the tailcall. See how the
+        // X86tcret_6regs and X86tcret_1reg classes are used and defined.
+        unsigned NumRegs = 0;
+        for (unsigned I = 3, E = N->getNumOperands(); I != E; ++I) {
+          if (isa<RegisterSDNode>(N->getOperand(I)))
+            ++NumRegs;
+        }
+        if (!Subtarget->is64Bit() && NumRegs > 1)
+          continue;
+        if (NumRegs > 6)
----------------
zmodem wrote:

You're right, Win64 has one less. I stole my code from the `X86tcret_6regs` fragment: https://github.com/llvm/llvm-project/blob/e58d227b09d533e2df644f827cedff8e206e0bfc/llvm/lib/Target/X86/X86InstrFragments.td#L676-L684

which is what the folding pattern for `TCRETURNmi64` uses: https://github.com/llvm/llvm-project/blob/e58d227b09d533e2df644f827cedff8e206e0bfc/llvm/lib/Target/X86/X86InstrCompiler.td#L1345-L1349

So that seems wrong for Win64.

I think the source of truth here is the register class which the folded instruction actually uses, which is `ptr_rc_tailcall` that gets defined by `X86RegisterInfo::getGPRsForTailCall`: https://github.com/llvm/llvm-project/blob/a2c1ff10eb930dd56be306dc0818d6ff31fff546/llvm/lib/Target/X86/X86RegisterInfo.cpp#L227-L239

That one seems to handle Win64 correctly, and also takes the calling convention into account in general.

---

So I think `X86tcret_6regs` should not hard-code 6, but check the `ptr_rc_tailcall` register class, and we should extract the code into a function that we can also use when moving the load.

And we should do the same for `X86tcret_1reg`, which is similar but has some differences: 

https://github.com/llvm/llvm-project/blob/e58d227b09d533e2df644f827cedff8e206e0bfc/llvm/lib/Target/X86/X86InstrFragments.td#L686-L699

It's checking whether the load uses a frame slot or a global, in which case it figures that doesn't use up any extra registers. I'm not 100% convinced that's true for the global case? And shouldn't we do the same check in `X86tcret_6regs`?

https://github.com/llvm/llvm-project/pull/137643


More information about the llvm-commits mailing list