<div dir="ltr">Hi Dean,<div><br></div><div>I looked at XRay. I also thought on the similar line to add assembly instructions as auxiliary template code and jump on to there. However, that may still dis-align the stack. I have to think about it. But your XRay code does give me the courage to think about this seriously.</div><div><br></div><div>Thank you for your help. I also figured out that we can access certain CodeGen's feature right from the IR level, as you have explained your tussle of dealing with IR and CodeGen together. Hopefully I can work out a convenient way.</div><div class="gmail_extra"><br clear="all"><div><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr">Regards,<div>Soham Sinha</div><div>PhD Student, Department of Computer Science</div><div>Boston University</div></div></div></div></div></div></div></div>

<br><div class="gmail_quote">On Mon, May 7, 2018 at 8:38 PM, Dean Michael Berris <span dir="ltr"><<a href="mailto:dean.berris@gmail.com" target="_blank">dean.berris@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On Tue, May 8, 2018 at 4:06 AM Soham Sinha <<a href="mailto:soham1@bu.edu">soham1@bu.edu</a>> wrote:<br>

<br>

> Hello Dean,<br>

<br>

> I looked at the XRay Instrumentation. That's a nice engineering effort. I<br>

am sure you had your motivation to do this in CodeGen just like I wanted to<br>

do. I don't understand all of your code but I get the idea that you are<br>

adjusting the alignment with explicit bytes and no-op instructions. My<br>

problem is also very much related to yours where my stack pointer ($rsp)<br>

alignment breaks in printf.<br>

<br>

> Having said that, I am not sure whether I need the engineering effort<br>

that you have pursued. I am trying to add function calls in some places of<br>

the machine code. I followed X86_64 calling convention to do so. I saved<br>

(pushed into stack) all the necessary registers (also tried saving all the<br>

16 registers) and then filled up 3 arguments in rdi, rsi, rdx and then call<br>

the desired function (and then pop the registers). Mathematically, saving<br>

the 16 register should not break the alignment of the stack pointer. But<br>

when I am trying to debug with gdb, I see that the alignment breaks<br>

sometimes during the push operations of 16 registers, and it comes as<br>

broken alignment in the printf function. I am very confused what can go<br>

wrong here. This is why I was trying to rely on LLVM to maintain the<br>

alignment.<br>

<br>

> Interestingly, at the start of the runOnMachineFunction, I check the<br>

alignment of the function and also at the end of the runOnMachineFunction<br>

(after my push, call function and pop). The alignment stays same as 4 (16<br>

bytes). Therefore, I guess, the BuildMI function doesn't maintain the<br>

alignment and doesn't even report the broken alignment through the<br>

alignment variable of MachineFunction. I access the alignment through the<br>

function, getAlignment. I think BuildMI should have cared about alignment<br>

or at least update the alignment value.<br>

<br>

<br>

</div></div>IIRC, getAlignment() tells you the function's *code* alignment, not whether<br>

the stack is aligned to a certain boundary at a given point. I don't know<br>

whether that information is maintained per MachineBasicBlock, because the<br>

decision on whether to spill variables onto the stack is done on a<br>

per-function-call basis -- you may need to look at the way functions are<br>

lowered specifically in X86 to see the (complicated) logic to figure out<br>

whether/how to spill which registers onto the stack and how to lay out the<br>

stack.<br>

<br>

To address this partially, we not only insert the custom event<br>

pseudo-instruction, but we dispatch to a trampoline that's defined in<br>

compiler-rt -- that code will maintain the stack alignment before making a<br>

function call. It saves all the relevant registers first, aligns the stack,<br>

then calls the function -- upon return we restore the registers from the<br>

stack. Essentially we're doing a context-switch, which might be what you're<br>

looking to do as well. That code is in compiler-rt hand-written as x86_64<br>

assembly.<br>

<br>

See<br>

<a href="https://github.com/llvm-mirror/compiler-rt/blob/master/lib/xray/xray_trampoline_x86_64.S#L224" rel="noreferrer" target="_blank">https://github.com/llvm-<wbr>mirror/compiler-rt/blob/<wbr>master/lib/xray/xray_<wbr>trampoline_x86_64.S#L224</a><br>

for some inspiration.<br>

<br>

The custom event instrumentation points just call into the trampoline,<br>

setting up the arguments on the spot. We've had to do some gymnastics to<br>

make that happen all the way up to the IR -- i.e. we insert the<br>

instrumentation as calls to LLVM intrinsics at the IR, and preserve those<br>

all the way down to the codegen. Doing it another way seemed much too hard,<br>

as you may be finding out. :(<br>

<span class=""><br>

> I am afraid if I follow your path of instrumentation, again I might<br>

ultimately face the same issue where I could not maintain the alignment.<br>

Your effort is quite similar to what I am trying to do, but I am just<br>

  doing it in the MachineFunctionPass itself.<br>

<br>

> It's very non-trivial and tedious to change the internals of CodeGen<br>

because the LLVM MC infrastructure is very much intertwined with the<br>

Assembler. That makes compilation faster but instrumentation tougher. This<br>

is why I wrote a MachineFunctionPass so that my instrumentation stays like<br>

a module. I add my MachineFunctionPass at the end of addPreEmitPass phase<br>

of X86.<br>

<br>

> I wish LLVM provided more modular ways of instrumentation just like it<br>

provides similar instrumentation in the LLVM IR level.<br>

<br>

<br>

</span>I have the same wish -- it'd be great if we can move the XRay<br>

instrumentation to normal MachineFunctionPass implementations.<br>

<br>

Just a thought -- have you considered using XRay instrumentation as a<br>

framework instead to accomplish what you're trying to do? I mean, instead<br>

of implementing your own pass?<br>

<div class="HOEnZb"><div class="h5"><br>

> Regards,<br>

> Soham Sinha<br>

> PhD Student, Department of Computer Science<br>

> Boston University<br>

<br>

> On Mon, May 7, 2018 at 1:20 AM, Dean Michael Berris<br>

> <<a href="mailto:dean.berris@gmail.com">dean.berris@gmail.com</a>><br>

wrote:<br>

<br>

>> On Sun, May 6, 2018 at 7:26 AM Soham Sinha via llvm-dev <<br>

>> <a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:<br>

<br>

>> > Hello,<br>

<br>

>> > I want to add assembly instructions at certain points in a function.<br>

This<br>

>> is X86 specific. So I am working in the lib/Target/X86 folder. I create a<br>

>> `MachineFunctionPass` in that folder. I register it in the<br>

>> X86TargetMachine.cpp in addPreEmitPass(). I use BuildMI to insert my own<br>

>> assembly instructions in the MachineFunctionPass. This works and my<br>

>> assembly instructions are inserted at desired places. However, this<br>

breaks<br>

>> the alignment. So when I run the generated code, I get segmentation fault<br>

>> (precisely in printf with XMM registers). Where should I add my pass?<br>

<br>

<br>

>> It sounds like you're running into stack alignment issues. If you're<br>

adding<br>

>> data to the stack, you may need to work a little harder with maintaining<br>

>> the state of the stack. This is not trivial to do especially if you're<br>

>> emitting the assembly by the time you're at a MachineFunctionPass<br>

(because<br>

>> register spilling and/or stack alignment information would have already<br>

>> been done by the time you're in machine instruction lowering). What you<br>

may<br>

>> need to do here is to either:<br>

<br>

>> - hook into the preamble and stack re-alignment code specifically in X86<br>

>> that would look at information from your pass. This is not trivial and I<br>

>> don't recommend going down this path (I tried, but I lost the patience to<br>

>> do it properly).<br>

<br>

>> - when emitting the assembly instructions that involve pushing/popping<br>

from<br>

>> the stack, that you're keeping track of the alignment of the stack<br>

>> variables. This is what we do with XRay, when we're lowering the custom<br>

>> event sleds.<br>

<br>

>> - use pseudo-instructions and preserving those until lowering, where the<br>

>> lowering<br>

<br>

>> > My pass depends on the MachineBasicBlock information as well.<br>

Therefore,<br>

>> I cannot add my pass too early in LLVM IR. What is the proper pass to add<br>

>> my custom MachineFunctionPass? I tried addPreRegAlloc, but it failed due<br>

to<br>

>> insufficient register allocation error or something on that line.<br>

<br>

>> > Can anybody please help me write a MachineFunctionPass where I can<br>

insert<br>

>> assembly instruction without breaking the alignment? I am doing this for<br>

>> X86_64.<br>

<br>

<br>

>> You can look at the XRay lowering for the PATCHABLE_EVENT_CALL lowering<br>

in<br>

>> X86AsmPrinter as a guide for the lowering, but you might also want to see<br>

>> how we're inserting these pseudo-instructions from the<br>

<br>

>> I don't remember having to specify where the pass is defined, since it's<br>

>> already in the assembly printing. So you might consider inserting these<br>

>> pseudo-instructions a the MachineFunctionPass, which gets lowered<br>

>> appropriately in the assembly printer. Unfortunately I don't think<br>

there's<br>

>> a generic way of doing this (yet) with the X86 back-end. There might be a<br>

>> good case for making this easier, but right now these kinds of things<br>

>> haven't been too important to fix yet.<br>

<br>

>> Hope this helps!<br>

>> --<br>

>> Dean<br>

<br>

<br>

<br>

<br>

</div></div><span class="HOEnZb"><font color="#888888">-- <br>

Dean<br>

</font></span></blockquote></div><br></div></div>