[LLVMdev] SmallString + raw_svector_ostream combination should be more efficient

Yaron Keren yaron.keren at gmail.com
Mon Apr 20 08:43:38 PDT 2015


Even if the memory overhead is small, they do introduce code complexity in
coordinating the SmallString and raw_svector_ostream and much runtime cost:

raw_ostream::write() calls raw_svector_ostream::write_impl()
and raw_ostream::copy_to_buffer().
raw_svector_ostream::write_impl() calls OS.reserve and SetBuffer.
SetBuffer calls SetBufferAndMode(). testing the BufferMode for every time
and writing these three pointers.

Every function has several code paths, adding to complexity and runtime
cost. Due to this complexity, there are *additional* tests in
raw_ostream::write() callers trying to take shortcuts to avoid calling
write_impl()... these tests also take time.

Essentially, what we are trying to achieve with SmallString SmallString +
raw_svector_ostream is be much, much simpler than what we have now, along
the lines Rafael suggested:

if there is enough space write the string
else reallocate, copy and write the string.

That's simpler and shorter code by order of magnitude than the combination
have now, and a bonus - with a smaller memory footprint. What more can we
ask for?



2015-04-20 18:19 GMT+03:00 David Blaikie <dblaikie at gmail.com>:

> On Sun, Apr 19, 2015 at 7:40 AM, Yaron Keren <yaron.keren at gmail.com>
> wrote:
> > A very common code pattern in LLVM is
> >
> >  SmallString<128> S;
> >  raw_svector_ostream OS(S);
> >  OS<< ...
> >  Use OS.str()
> >
> > While raw_svector_ostream is smart to share the text buffer itself, it's
> > inefficient keeping two sets of pointers to the same buffer:
> >
> >  In SmallString: void *BeginX, *EndX, *CapacityX
> >  In raw_ostream: char *OutBufStart, *OutBufEnd, *OutBufCur
>
> Any reason to believe this inefficiency is significant/important?
> Given that these are never in long-lived containers, but generally
> just on the stack, it doesn't seem like the extra 3 pointers would be
> very costly in terms of overall performance.
>
> >
> > Moreover, at runtime the two sets of pointers need to be coordinated
> between
> > the SmallString and raw_svector_ostream using raw_svector_ostream::init,
> > raw_svector_ostream::pwrite, raw_svector_ostream::resync and
> > raw_svector_ostream::write_impl.
> > All these functions have non-inlined implementations in raw_ostream.cpp.
> >
> > Finally, this may cause subtle bugs if S is modified without calling
> > OS::resync(). This is too easy to do by mistake.
> >
> > In this frequent case usage the client does not really care about S
> being a
> > SmallString with its many useful string helper function. It's just
> > boilerplate code for raw_svector_ostream. But it does cost three extra
> > pointers, some runtime performance and possible bugs.
> >
> > To solve all three issues, would it make sense to have
> raw_ostream-derived
> > container with a its own SmallString like templated-size built-in buffer?
> >
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150420/1f8928ee/attachment.html>


More information about the llvm-dev mailing list