[PATCH] D99038: AMDGPU/GlobalISel: Implement tail calls

Mon Mar 22 07:25:01 PDT 2021

Flakebi added a comment.

It would be nice if we can put the common code with AArch64 into some generic place instead of duplicating it.
Other than that, looks good.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp:967-970
+  auto TRI = ST.getRegisterInfo();
+  const uint32_t *CallerPreserved = TRI->getCallPreservedMask(MF, CallerCC);
+  const uint32_t *CalleePreserved = TRI->getCallPreservedMask(MF, CalleeCC);
+  return TRI->regmaskSubsetEqual(CallerPreserved, CalleePreserved);
----------------
Checking the register mask seems pretty cheap, while checking argument assignment looks more expensive.
Does it make sense to do this check first? That should bail out faster if the calling conventions don’t match.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp:1224
+    // keep it 16-byte aligned.
+    NumBytes = alignTo(OutInfo.getNextStackOffset(), 16);
+
----------------
Can this use `getStackAlignment()` instead of hardcoding 16?

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp:1236
+    // satisfy the same constraint.
+    assert(FPDiff % 16 == 0 && "unaligned stack on tail call");
+  }
----------------
Same about `getStackAlignment()`

================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-sibling-call.ll:371-373
+; Tail call disallowed with byval in parent, not callee. The stack
+; usage of incoming arguments must be <= the outgoing stack
+; arguments.
----------------
The parent does not use byval. Is this comment correct?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99038/new/

https://reviews.llvm.org/D99038