libc++: First cut at <dynarray>
chandlerc at google.com
Thu Sep 12 20:29:59 PDT 2013
On Thu, Sep 12, 2013 at 8:19 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> > Speaking from both the Clang and LLVM side: I don't think we know
> > what we want to have to put things on the stack, and I am confident
> > we won't have it by Chicago. There are big, nasty, hard to answer
> > questions in the space of compiler-aided variable sized stack
> > allocation. Currently on x86 with LLVM, if the size is variable and
> > you have a reasonably fast malloc library, putting dynarray on the
> > stack will often be a pessimization. I'm not sure how often we can
> > make variable sized stack allocation the right choice, but it will
> > at the least require some reasonably significant changes to LLVM's
> > optimizer itself.
> > Even then, I currently expect that small vector, or a standard
> > allocator that pulls initially from a fixed-size stack-based pool,
> > will be significantly faster, and the only reason for having
> > dynarray at all was performance! Considering how hard it is to
> > implement, I'm inclined currently to go back to the committee with
> > the lesson "it's really hard, and it won't even be faster. can we
> > just stick with vector?"
> Not to be argumentative ;) -- but I strongly disagree. Good pool
> allocators are very fast, granted, but tend to add extra memory overhead
> which, in some environments, is hard to justify (I'm speaking from personal
> experience here).
But I'm not speaking of pool allocators. I'm speaking of using a fixed
maximum amount of stack space, and then falling back to the heap when
growing past that bound. Having the fixed bound allows the stack
reservation to be non-dynamic.
> Having dynamic stack allocation in C++ would be a great step forward.
I don't think experience with pool-based allocators motivates this, and I
think you need actual benchmark numbers based on LLVM, Clang, and modern
C++ code to motivate it.
> Furthermore, I don't why this would not be trivial to implement:
> 1. We add an intrinsic, like alloca, called alloca_in_caller. This is
> only allowed in functions marked always_inline.
> 2. To the analysis passes, alloca_in_caller looks like malloc
> 3. When the inliner inlines a alloca_in_caller call, it transforms it
> into an alloca call
> (and that's it I think). Perhaps I'm way off base, but it seems fairly
> simple. Is there anything else that's needed?
You don't need any of the inline stuff. You just need some builtin which
directly produces a dynamically sized alloca instruction and the
stacksave/stackrestore pair to manage its lifetime.
But the consequences of using this are pretty significant. The existence of
a dynamic alloca impacts the register set available and has other
unfortunate side effects. As a consequence of that, the inliner refuses to
inline a function with a dynamic alloca into a caller which currently has
no dynamic alloca. Solving these fundamental problems is actually somewhat
hard, and they all go away if you remove the *dynamic* aspect to the stack
> Thanks again,
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the cfe-commits