[LLVMdev] SmallString + raw_svector_ostream combination should be more efficient
Sean Silva
chisophugis at gmail.com
Fri May 1 18:39:27 PDT 2015
Could you dig into why:
+ raw_ostream &write(unsigned char C) override {
+ grow(1);
+ *OutBufCur++ = C;
+ return *this;
+ }
Is 3 times as fast as raw_svector_ostream? I don't see a good reason why
that should be any faster than:
raw_ostream &operator<<(char C) {
if (OutBufCur >= OutBufEnd)
return write(C);
*OutBufCur++ = C;
return *this;
}
You might just be seeing the difference between having write inlined vs.
non-inlined, in which case your patch might be a complicated way to pull
the likely cases of some raw_ostream methods into the header.
-- Sean Silva
On Thu, Apr 30, 2015 at 8:02 PM, Yaron Keren <yaron.keren at gmail.com> wrote:
> Yes, we should do without virtual flush. The problem was raw_char_ostream
> should not be flushed - it's always current - so I overloaded flush to a
> no-op. A cleaner solution is attached, adding a DirectBuffer mode to the
> raw_ostream.
>
> Also added a simple performance comparison project
> between raw_char_ostream and raw_svector_ostream. On Window 7 x64
> machine, raw_char_ostream was three times faster than raw_svector_ostream
> when the provided buffer size is large enough and two times as fast with
> one dynamic allocation resize.
>
>
> 2015-04-30 22:48 GMT+03:00 Rafael EspĂndola <rafael.espindola at gmail.com>:
>
>> I don't think we should make flush virtual. Why do you need to do it?
>> Can't you set up the base class to write to use the tail of the memory
>> region as the buffer?
>>
>>
>>
>> On 24 April 2015 at 06:46, Yaron Keren <yaron.keren at gmail.com> wrote:
>> > Hi,
>> >
>> > Is this what you're thinking about?
>> > The code is not tested yet, I'd like to know if the overall direction
>> and
>> > design are correct.
>> >
>> > Yaron
>> >
>> >
>> > 2015-04-21 0:10 GMT+03:00 Yaron Keren <yaron.keren at gmail.com>:
>> >>
>> >> Sean, thanks for reminding this, Alp did commit a class derived from
>> >> raw_svector_ostream conatining an internal SmallString he called
>> >> small_string_ostream. The commit was reverted after a day due to a
>> >> disagreement about the commit approval and apparently abandoned.
>> >>
>> >>
>> >>
>> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223393.html
>> >>
>> >>
>> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223557.html
>> >>
>> >>
>> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140630/223986.html
>> >>
>> >> The class Alp comitted did solve the possible mismatch between the
>> >> SmallString and the stream by making the SmallString private to the
>> class.
>> >> This however did not treat the root problem, the duplication of the
>> >> information about the buffer between SmallString and the stream.
>> >>
>> >> I can make performance measurements but even if the performance
>> difference
>> >> is neglible (and looking at all the code paths and conditionals of
>> >> non-inlined functions at raw_ostream.cpp that's hard to believe), the
>> >> current SmallString-raw_svector_ostream is simply wrong.
>> >>
>> >> Personally, after the previous "Alp" discussion I found and fixed
>> several
>> >> such issues in my out-of-tree code only to make new similar new error
>> >> earlier this year, which I caught only months after, when Rafael
>> committed
>> >> the pwrite which reminded me the issue. Due ot the information
>> duplication
>> >> Rafael commit also had a bug and Mehdi reports similar experience.
>> Back then
>> >> Alp reported similar problems he found in LLVM code which were
>> hopefully
>> >> fixed otherwise.
>> >>
>> >> In addition to the information duplication, the design is quite
>> confusing
>> >> to use
>> >>
>> >> - Should one use the stream.str() function or the SmallString itself?
>> >> - flush or str?
>> >> - How do you properly clear the combination for reuse (that was my
>> >> mistake, I forgot to resync after OS.clear())?
>> >>
>> >> It's safe to say this design pattern is very easy to get wrong one way
>> or
>> >> another, we got burned by it multiple times and it should be replaced.
>> >>
>> >> Alp suggested a derived class containing its own SmallString. That's a
>> >> simple and effective approach to avoid the bugs mentioned but does not
>> fix
>> >> the memory or runtime issues. Small as they may be, why have them at a
>> >> fundemental data structure?
>> >>
>> >> I was thinking about going further avoiding all duplication with a
>> >> templated class derived with its own internal buffer storage.
>> >>
>> >> Rafael suggested a more modular approach, a derived adpater class
>> adapter
>> >> to a *simple* buffer (or nullptr for fully-dynamic operation) which
>> also
>> >> won't duplicate any information the buffer is dumb and has no state.
>> >>
>> >> That solution sounds simple, efficient and safe to use. The
>> implementation
>> >> would be actually simpler then raw_svector_ostream since all the
>> >> coordination logic is not required anymore.
>> >>
>> >>
>> >> 2015-04-20 22:17 GMT+03:00 Sean Silva <chisophugis at gmail.com>:
>> >>>
>> >>>
>> >>>
>> >>> On Sun, Apr 19, 2015 at 7:40 AM, Yaron Keren <yaron.keren at gmail.com>
>> >>> wrote:
>> >>>>
>> >>>> A very common code pattern in LLVM is
>> >>>>
>> >>>> SmallString<128> S;
>> >>>> raw_svector_ostream OS(S);
>> >>>> OS<< ...
>> >>>> Use OS.str()
>> >>>>
>> >>>> While raw_svector_ostream is smart to share the text buffer itself,
>> it's
>> >>>> inefficient keeping two sets of pointers to the same buffer:
>> >>>>
>> >>>> In SmallString: void *BeginX, *EndX, *CapacityX
>> >>>> In raw_ostream: char *OutBufStart, *OutBufEnd, *OutBufCur
>> >>>>
>> >>>> Moreover, at runtime the two sets of pointers need to be coordinated
>> >>>> between the SmallString and raw_svector_ostream using
>> >>>> raw_svector_ostream::init, raw_svector_ostream::pwrite,
>> >>>> raw_svector_ostream::resync and raw_svector_ostream::write_impl.
>> >>>> All these functions have non-inlined implementations in
>> raw_ostream.cpp.
>> >>>>
>> >>>> Finally, this may cause subtle bugs if S is modified without calling
>> >>>> OS::resync(). This is too easy to do by mistake.
>> >>>>
>> >>>> In this frequent case usage the client does not really care about S
>> >>>> being a SmallString with its many useful string helper function.
>> It's just
>> >>>> boilerplate code for raw_svector_ostream. But it does cost three
>> extra
>> >>>> pointers, some runtime performance and possible bugs.
>> >>>
>> >>>
>> >>> I agree the bugs are real (Alp proposed something a while back
>> regarding
>> >>> this?), but you will need to provide measurements to justify the cost
>> in
>> >>> runtime performance. One technique I have used in the past to measure
>> these
>> >>> sorts of things I call "stuffing": take the operation that you want to
>> >>> measure, then essentially change the logic so that you pay the cost 2
>> times,
>> >>> 3 times, etc. You can then look at the trend in performance as N
>> varies and
>> >>> extrapolate back to the case where N = 0 (i.e. you don't pay the
>> cost).
>> >>>
>> >>> For example, in one situation where I used this method it was to
>> measure
>> >>> the cost of stat'ing files (sys::fs::status) across a holistic build,
>> using
>> >>> only "time" on the command line (it was on Windows and I didn't have
>> any
>> >>> tools like DTrace available that can directly measure this). In order
>> to do
>> >>> this, I changed sys::fs::status to call stat N times instead of 1, and
>> >>> measured with N=1 N=2 N=3 etc. The result was that the difference
>> between
>> >>> the N and N+1 versions was about 1-2% across N=1..10 (or whatever I
>> >>> measured). In order to negate caching and other confounding effects,
>> it is
>> >>> important to try different distributions of stats; e.g. the extra
>> stats are
>> >>> on the same file as the "real" stat vs. the extra stats are on
>> nonexistent
>> >>> files in the same directory as the "real" file vs. parent directories
>> of the
>> >>> "real" file; if these match up fairly well (they did), then you have
>> some
>> >>> indication that the "stuffing" is measuring what you want to measure.
>> >>>
>> >>> So e.g. if you think the cost of 3 extra pointers is significant, then
>> >>> "stuff" the struct with 3, 6, 9, ... extra pointers and measure the
>> >>> difference in performance (e.g. measure the time building a real
>> project).
>> >>>
>> >>> -- Sean Silva
>> >>>
>> >>>>
>> >>>>
>> >>>> To solve all three issues, would it make sense to have
>> >>>> raw_ostream-derived container with a its own SmallString like
>> templated-size
>> >>>> built-in buffer?
>> >>>>
>> >>>
>> >
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150501/d6e980d6/attachment.html>
More information about the llvm-dev
mailing list