<div dir="ltr"><div>Hi LLVM folks,</div><div><br></div><div>To properly implement pass-by-value in the Microsoft C++ ABI, we need to be able</div><div>to take the address of an outgoing call argument slot.  This is</div><div>

<a href="http://llvm.org/PR5064">http://llvm.org/PR5064</a> .</div><div><br></div><div>Problem</div><div>-------</div><div><br></div><div>On Windows, C structs are pushed right onto the stack in line with the other</div><div>

arguments.  In LLVM, we use byval to model this, and it works for C structs.</div><div>However, C++ records are also passed this way, and reusing byval for C++ records</div><div>breaks C++ object identity rules.</div><div>

<br></div><div>In order to implement the ABI properly, we need a way to get the address of the</div><div>argument slot *before* we start the call, so that we can either construct the</div><div>object in place on the stack or at least call its copy constructor.</div>

<div><br></div><div>This is further complicated by the possibility of nested calls passing arguments by</div><div>value.  A good general case to think about is a binary tree of calls that take</div><div>two arguments by value and return by value:</div>

<div><br></div><div>  struct A { int a; };</div><div>  A foo(A, A);</div><div>  foo(foo(A(), A()), foo(A(), A()));</div><div><br></div><div>To complete the outer call to foo, we have to adjust the stack for its outgoing</div>

<div>arguments before the inner calls to foo, and arrange for the sret pointers to</div><div>point to those slots.</div><div><br></div><div>To make this even more complicated, C++ methods are typically callee cleanup (thiscall), but free functions are caller cleanup (cdecl).</div>

<div><br></div><div>Features</div><div>--------</div><div><br></div><div>A few weeks ago, I sat down with some folks at Google and we came up with this</div><div>proposal, which tries to add the minimum set of LLVM IL features to make this</div>

<div>possible.</div><div><br></div><div>1. Allow alloca instructions to use llvm.stacksave values to indicate scoping.</div><div><br></div><div>This creates an SSA dependence between the alloca instruction and the</div><div>

stackrestore instruction that prevents optimizers from accidentally reordering</div><div>them in ways that don't verify.  llvm.stacksave in this case is taking on a role</div><div>similar to CALLSEQ_START in the selection dag.</div>

<div><br></div><div>LLVM can also apply this to dynamic allocas from inline functions to ensure that</div><div>optimizers don't move them.</div><div><br></div><div>2. Add an 'alloca' attribute for parameters.</div>

<div><br></div><div>Only an alloca value can be passed to a parameter with this attribute.  It</div><div>cannot be bitcasted or GEPed.  An alloca can only be passed in this way once.</div><div>It can be passed as a normal pointer to any number of other functions.</div>

<div><br></div><div>Aside from allocas bounded by llvm.stacksave and llvm.stackrestore calls, there</div><div>can be no allocas between the creation of an alloca passed with this attribute</div><div>and its associated call.</div>

<div><br></div><div>3. Add a stackrestore field to call and invoke instructions.</div><div><br></div><div>This models calling conventions which do their own cleanup, and ensures that</div><div>even after optimizations have perturbed the IR, we don't consider the allocas to</div>

<div>be live.  For caller cleanup conventions, while the callee may have called</div><div>destructors on its arguments, the allocas can be considered live until the stack</div><div>restore.</div><div><br></div><div>Example</div>

<div>-------</div><div><br></div><div>A single call to foo, assuming it is stdcall, would be lowered something like:</div><div><br></div><div>%res = alloca %struct.A</div><div>%base = llvm.stacksave()</div><div>%arg1 = alloca %struct.A, stackbase %base</div>

<div>%arg2 = alloca %struct.A, stackbase %base</div><div>call @A_ctor(%arg1)</div><div>call @A_ctor(%arg2)</div><div>call x86_stdcallcc @foo(%res sret, %arg1 alloca, %arg2 alloca), stackrestore %base</div><div><br></div><div>

If control does not flow through a call or invoke with a stackrestore field,</div><div>then manual calls to llvm.stackrestore must be emitted before another call or</div><div>invoke can use an 'alloca' argument.  The manual stack restore call ends the</div>

<div>lifetime of the allocas.  This is necessary to handle unwind edges from argument</div><div>expression evaluation as well as the case where foo is not callee cleanup.</div><div><br></div><div>Implementation</div><div>

--------------</div><div><br></div><div>By starting out with the stack save and restore intrinsics, we can hopefully</div><div>approach a slow but working implementation sooner rather than later.  The work</div><div>should mostly be in the verifier, the IR, its parser, and the x86 backend.</div>

<div><br></div><div>I don't plan to start working on this immediately, but over the long run this will be really important to support well.</div><div><br></div><div>---</div><div><br></div><div>That's all!  Please send feedback!  This is admittedly a really complicated</div>

<div>feature and I'm sorry for inflicting it on the LLVM community, but it's</div><div>obviously beyond my control.</div><div><br></div></div>