[LLVMdev] SmallString + raw_svector_ostream combination should be more efficient

Mon Apr 20 12:17:15 PDT 2015

On Sun, Apr 19, 2015 at 7:40 AM, Yaron Keren <yaron.keren at gmail.com> wrote:

> A very common code pattern in LLVM is
>
>  SmallString<128> S;
>  raw_svector_ostream OS(S);
>  OS<< ...
>  Use OS.str()
>
> While raw_svector_ostream is smart to share the text buffer itself, it's
> inefficient keeping two sets of pointers to the same buffer:
>
>  In SmallString: void *BeginX, *EndX, *CapacityX
>  In raw_ostream: char *OutBufStart, *OutBufEnd, *OutBufCur
>
> Moreover, at runtime the two sets of pointers need to be coordinated
> between the SmallString and raw_svector_ostream using
> raw_svector_ostream::init, raw_svector_ostream::pwrite, raw_svector_ostream::resync
> and raw_svector_ostream::write_impl.
> All these functions have non-inlined implementations in raw_ostream.cpp.
>
> Finally, this may cause subtle bugs if S is modified without calling
> OS::resync(). This is too easy to do by mistake.
>
> In this frequent case usage the client does not really care about S being
> a SmallString with its many useful string helper function. It's just
> boilerplate code for raw_svector_ostream. But it does cost three extra
> pointers, some runtime performance and possible bugs.
>

I agree the bugs are real (Alp proposed something a while back regarding
this?), but you will need to provide measurements to justify the cost in
runtime performance. One technique I have used in the past to measure these
sorts of things I call "stuffing": take the operation that you want to
measure, then essentially change the logic so that you pay the cost 2
times, 3 times, etc. You can then look at the trend in performance as N
varies and extrapolate back to the case where N = 0 (i.e. you don't pay the
cost).

For example, in one situation where I used this method it was to measure
the cost of stat'ing files (sys::fs::status) across a holistic build, using
only "time" on the command line (it was on Windows and I didn't have any
tools like DTrace available that can directly measure this). In order to do
this, I changed sys::fs::status to call stat N times instead of 1, and
measured with N=1 N=2 N=3 etc. The result was that the difference between
the N and N+1 versions was about 1-2% across N=1..10 (or whatever I
measured). In order to negate caching and other confounding effects, it is
important to try different distributions of stats; e.g. the extra stats are
on the same file as the "real" stat vs. the extra stats are on nonexistent
files in the same directory as the "real" file vs. parent directories of
the "real" file; if these match up fairly well (they did), then you have
some indication that the "stuffing" is measuring what you want to measure.

So e.g. if you think the cost of 3 extra pointers is significant, then
"stuff" the struct with 3, 6, 9, ... extra pointers and measure the
difference in performance (e.g. measure the time building a real project).

-- Sean Silva

>
> To solve all three issues, would it make sense to have raw_ostream-derived
> container with a its own SmallString like templated-size built-in buffer?
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150420/56cf968b/attachment.html>