libc++: First cut at <dynarray>

Fri Sep 13 07:54:03 PDT 2013

See below…

On Sep 12, 2013, at 11:33 PM, Marshall Clow <mclow.lists at gmail.com> wrote:

> I've been talking to Nick and to Richard over the last week, and I think we need more than just an alloca.
> The compiler needs to decide when to put stuff on the stack.
> 
> Consider the following code:
> 	typedef std::dynarray<long> dArray;
> 
> 	{
> 	dArray arr1 ( 6 );
> 	// some code that uses arr1
> 	}
> 
> The six longs that make up dynarray really belong on the stack.
> 
> 	{
> 	return new dArray ( 6 );
> 	}
> 
> These six longs cannot be put on the stack. When would they be deallocated?

We supported the moral equivalent of these examples at Tartan Labs. The trick we used for returning a dynamic array (as in the 2nd example) is ugly but straight-forward.

When you get ready to return from the function, you reach up to the caller's saved registers in his stack frame, and add the size of the returned object (plus any necessary padding) to his stack pointer.  You also block-copy the returned object to the bytes just above the end of the caller's stack frame (this bashes a bunch of the current routine's variables in its stack frame, but that's OK because they're dead -- you do this in the function epilog).  Of course, the function's return value is the address of the returned array.

As for efficiency of the block copy…  Well, if the programmer didn't want the copy he shouldn't have returned a dynamic stack variable as a function result.  

Now, when you return, the caller's stack pointer includes the memory that is occupied by the dynarray, and all is well.  It's ugly, and it requires a few adjustments for architectures like Sparc that have hardware register windows (sigh), but it works just fine.

For the more general case of dynamic objects on the stack -- there's no fundamental reason why it should cause any problems for the optimizer.  There're two cases, both of which are straight-forward:
(A) You use the presence of dynamic objects on the stack as a reason to skip frame-pointer elimination.  This costs you 1 register, but is otherwise no big deal.  And it's free if you were already going to skip FPE for some other reason.
(B) You go ahead and do FPE.  This simply requires that you keep track (dynamically!) of how far you've bumped your stack pointer to include the dynamic data.  It's just another offset in your addressing, using a variable created by the compiler.  The decision of whether that offset variable occupies a register is just an ordinary register allocation decision, driven by frequency of use.
(Note that both cases require that the compiler track the size of the dynamic stack allocation.  It comes with the territory.)

In our experience, there was no reason for either of these cases to cause any restriction on inlining.  Just apply your usual heuristics.

None of this is particularly difficult, nor is it rocket science.  Finally, there's no reason for any of it to be anywhere near as slow as a call to a heap allocator.  Really.  Go track down one of the folks from one of the Ada vendors who's compilers do/did reasonable optimization -- they've been optimizing stuff like this since the early '80s). It really isn't difficult.  It's only a bit tedious.  We implemented features like this on machines ranging from 68K to Sparc, to i960, to several different DSPs.  It even works on the Mil-Std-1750a (a 16-bit word-addressed machine with only signed arithmetic and an un-maskable interrupt on integer wrap-around -- UGH!).

Been there, done that, have many (many) T-shirts.

Dean

P.S.  I should note that the language allowed the programmer to force either stack or heap allocation at his choice.  If he wrote "my_array: ptr_to_array = new array(…whatever...)"  that forced heap allocation. If he declared the variable as "my_array: array[0..n] of long" that forced stack allocation. The compiler's only discretion in the matter was the "as-if" rule for optimization.