[PATCH] D36615: [XRay][CodeGen] Use PIC-friendly code in XRay sleds; remove synthetic references in .text

Dean Michael Berris via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Aug 21 22:52:20 PDT 2017


dberris added inline comments.


================
Comment at: lib/Target/X86/X86MCInstLower.cpp:1100
+          EmitAndCountInstruction(
+              MCInstBuilder(X86::PUSH64r).addReg(UsedRegs[I]));
           EmitAndCountInstruction(MCInstBuilder(X86::MOV64rr)
----------------
pcc wrote:
> dberris wrote:
> > pcc wrote:
> > > Here you are pushing rdi and rsi onto the stack whereas you weren't doing that before.
> > > 1) I don't understand why the pushes are necessary now when they weren't before.
> > > 2) Since you are pushing one more value than before, is the stack still aligned correctly?
> > Because this is an intrinsic being lowered, defined in `include/llvm/IR/Instrinsics.td`, we aren't quite able to provide a mask of the registers that will be clobbered/used by this routine. While we haven't quite run into bugs without stashing the `%rsi` and `%rdi` registers here (we do actually do the right thing in the trampoline), We also haven't been defining the calling convention for calls to this intrinsic (which we couldn't define for the intrinsic too). I think I'm just being a bit cautious here.
> > 
> > The alignment question is an interesting one. Because it's an intrinsic being lowered, and that we're already past the stack adjustment phase (and side-stepping the call-lowering code for normal call instructions), we haven't quite been tracking the stack adjustments here. What we do assume is that if the stack was aligned already, pushing two 64-bit values onto the stack will keep it aligned -- then the trampoline starts running, assuming that the stack is set up properly.
> > 
> > Do you have suggestions on how else to do/test this?
> Note that the callee is allowed to clobber more registers than just `%rsi` and `%rdi`, see the full list here: https://software.intel.com/sites/default/files/article/402129/mpx-linux64-abi.pdf
> 
> If the idea is that the `__xray_CustomEvent` function has a custom "callee-save almost all registers" calling convention, I'd expect you to need to save all registers in `__xray_CustomEvent` not marked as "preserved across function calls". It does at least seem to be necessary to save `%rdi` and `%rsi` in the function itself though.
> 
> For stack alignment, I don't think you can always assume `%rsp` to be properly aligned at the point where you emit the `__xray_CustomEvent` call. Here's a quick counterexample that shows that we can either align the stack correctly (in the case of foo) or incorrectly (in the case of bar) when calling the handler.
> ```
> #include <xmmintrin.h>
> 
> __attribute__((weak)) __m128 f(__m128 *i) {
>   return *i;
> }
> 
> void foo() {
>   __xray_customevent(0, 0);
>   __m128 v = {};
>   f(&v);
> }
> 
> void bar() {
>   __xray_customevent(0, 0);
> }
> ```
> 
> If you compile this you should see that `foo` subtracts 24 from `%rsp` before calling `__xray_CustomEvent`, while `bar` does not subtract anything.
> 
> I'm no backend expert, so I don't know how to fix this properly, but I imagine that you could always realign the stack in the custom event handler function.
> 
> In terms of testing, the function `foo` above is an example of a function that requires the stack to be aligned, so you could try writing an integration test that uses something like that as a custom event handler function.
Cool stuff!

I made the changes in D36816 to stash more registers in __xray_CustomEvent. We already have a special calling convention as you suggest (which is documented in the trampoline implementation). I'm not sure though whether the alignment in the case of the empty function does cause issues -- we use call instructions and already have a special calling convention for the trampolines.


https://reviews.llvm.org/D36615





More information about the llvm-commits mailing list