[LLVMdev] SmallString + raw_svector_ostream combination should be more efficient
Yaron Keren
yaron.keren at gmail.com
Sat May 2 11:31:54 PDT 2015
Following a hint from Duncan in http://llvm.org/pr23395, here is a revised
patch. Rather then introduce the raw_char_ostream adapter, this version
improves raw_svector_ostream.
raw_svector_ostream is now a very lightweight adapter, running without a
buffer and fowarding almost anything to the SmallString. This solves both
the performance issue and the information duplication issue. The
SmallString is always up-to-date and can be used at any time. flush() does
nothing and is not required. resync() was kept in this patch as a no-op and
will be removed with its users in a follow-up patch. I'll also try to
remove the superfluous flush calls() along the way.
2015-05-02 7:41 GMT+03:00 Yaron Keren <yaron.keren at gmail.com>:
> +update diff
>
>
> 2015-05-02 7:38 GMT+03:00 Yaron Keren <yaron.keren at gmail.com>:
>
>> I outlined (is that the right
>> word?) raw_char_ostream::grow, raw_char_ostream::write (both) into
>> raw_ostream.cpp with less than 10% difference in performance.
>>
>> Profiling reveals that the real culprit is the code line
>>
>> OS.reserve(OS.size() + 64);
>>
>> called from raw_svector_ostream::write_impl called from
>> raw_svector_ostream::~raw_svector_ostream.
>> The issue is already documented:
>>
>> raw_svector_ostream::~raw_svector_ostream() {
>> // FIXME: Prevent resizing during this flush().
>> flush();
>> }
>>
>> And a solution is provided in raw_svector_ostream::init():
>>
>> // Set up the initial external buffer. We make sure that the buffer has
>> at
>> // least 128 bytes free; raw_ostream itself only requires 64, but we
>> want to
>> // make sure that we don't grow the buffer unnecessarily on destruction
>> (when
>> // the data is flushed). See the FIXME below.
>> OS.reserve(OS.size() + 128);
>>
>> This solution may be worse than the problem. In total:
>>
>> * If the SmallString was less than 128 bytes init() will always(!) heap
>> allocate.
>> * If it's more or equal to 128 bytes but upon destruction less than 64
>> bytes are left unused it will heap allocate for no reason.
>>
>> A careful programmer who did size the SmallString to match its use +
>> small reserve will find either of these heap allocations surprising,
>> negating the SmallString advantage and then paying memcpy cost.
>>
>> I filed http://llvm.org/pr23395 about this. To temporarly avoid this
>> issue, in addition to the outline methods change I commented the OS.reserve
>> line in flush(). (This change of course requires further review.)
>>
>> With reserve being out of the way, the gap now is smaller but still
>> significant: raw_char_ostream is about 20% faster than
>> raw_svector_ostream in Release=optimized configuration.
>>
>>
>> 2015-05-02 4:39 GMT+03:00 Sean Silva <chisophugis at gmail.com>:
>>
>>>
>>> Could you dig into why:
>>> + raw_ostream &write(unsigned char C) override {
>>> + grow(1);
>>> + *OutBufCur++ = C;
>>> + return *this;
>>> + }
>>>
>>> Is 3 times as fast as raw_svector_ostream? I don't see a good reason why
>>> that should be any faster than:
>>>
>>> raw_ostream &operator<<(char C) {
>>> if (OutBufCur >= OutBufEnd)
>>> return write(C);
>>> *OutBufCur++ = C;
>>> return *this;
>>> }
>>>
>>>
>>> You might just be seeing the difference between having write inlined vs.
>>> non-inlined, in which case your patch might be a complicated way to pull
>>> the likely cases of some raw_ostream methods into the header.
>>>
>>>
>>> -- Sean Silva
>>>
>>
>> On Thu, Apr 30, 2015 at 8:02 PM, Yaron Keren <yaron.keren at gmail.com>
>> wrote:
>>
>>> Yes, we should do without virtual flush. The problem
>>> was raw_char_ostream should not be flushed - it's always current - so I
>>> overloaded flush to a no-op. A cleaner solution is attached, adding
>>> a DirectBuffer mode to the raw_ostream.
>>>
>>> Also added a simple performance comparison project
>>> between raw_char_ostream and raw_svector_ostream. On Window 7 x64
>>> machine, raw_char_ostream was three times faster than raw_svector_ostream
>>> when the provided buffer size is large enough and two times as fast with
>>> one dynamic allocation resize.
>>>
>>>
>>> 2015-04-30 22:48 GMT+03:00 Rafael EspĂndola <rafael.espindola at gmail.com>
>>> :
>>>
>>>> I don't think we should make flush virtual. Why do you need to do it?
>>>> Can't you set up the base class to write to use the tail of the memory
>>>> region as the buffer?
>>>>
>>>>
>>>>
>>>> On 24 April 2015 at 06:46, Yaron Keren <yaron.keren at gmail.com> wrote:
>>>> > Hi,
>>>> >
>>>> > Is this what you're thinking about?
>>>> > The code is not tested yet, I'd like to know if the overall direction
>>>> and
>>>> > design are correct.
>>>> >
>>>> > Yaron
>>>> >
>>>> >
>>>> > 2015-04-21 0:10 GMT+03:00 Yaron Keren <yaron.keren at gmail.com>:
>>>> >>
>>>> >> Sean, thanks for reminding this, Alp did commit a class derived from
>>>> >> raw_svector_ostream conatining an internal SmallString he called
>>>> >> small_string_ostream. The commit was reverted after a day due to a
>>>> >> disagreement about the commit approval and apparently abandoned.
>>>> >>
>>>> >>
>>>> >>
>>>> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223393.html
>>>> >>
>>>> >>
>>>> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223557.html
>>>> >>
>>>> >>
>>>> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140630/223986.html
>>>> >>
>>>> >> The class Alp comitted did solve the possible mismatch between the
>>>> >> SmallString and the stream by making the SmallString private to the
>>>> class.
>>>> >> This however did not treat the root problem, the duplication of the
>>>> >> information about the buffer between SmallString and the stream.
>>>> >>
>>>> >> I can make performance measurements but even if the performance
>>>> difference
>>>> >> is neglible (and looking at all the code paths and conditionals of
>>>> >> non-inlined functions at raw_ostream.cpp that's hard to believe), the
>>>> >> current SmallString-raw_svector_ostream is simply wrong.
>>>> >>
>>>> >> Personally, after the previous "Alp" discussion I found and fixed
>>>> several
>>>> >> such issues in my out-of-tree code only to make new similar new error
>>>> >> earlier this year, which I caught only months after, when Rafael
>>>> committed
>>>> >> the pwrite which reminded me the issue. Due ot the information
>>>> duplication
>>>> >> Rafael commit also had a bug and Mehdi reports similar experience.
>>>> Back then
>>>> >> Alp reported similar problems he found in LLVM code which were
>>>> hopefully
>>>> >> fixed otherwise.
>>>> >>
>>>> >> In addition to the information duplication, the design is quite
>>>> confusing
>>>> >> to use
>>>> >>
>>>> >> - Should one use the stream.str() function or the SmallString itself?
>>>> >> - flush or str?
>>>> >> - How do you properly clear the combination for reuse (that was my
>>>> >> mistake, I forgot to resync after OS.clear())?
>>>> >>
>>>> >> It's safe to say this design pattern is very easy to get wrong one
>>>> way or
>>>> >> another, we got burned by it multiple times and it should be
>>>> replaced.
>>>> >>
>>>> >> Alp suggested a derived class containing its own SmallString. That's
>>>> a
>>>> >> simple and effective approach to avoid the bugs mentioned but does
>>>> not fix
>>>> >> the memory or runtime issues. Small as they may be, why have them at
>>>> a
>>>> >> fundemental data structure?
>>>> >>
>>>> >> I was thinking about going further avoiding all duplication with a
>>>> >> templated class derived with its own internal buffer storage.
>>>> >>
>>>> >> Rafael suggested a more modular approach, a derived adpater class
>>>> adapter
>>>> >> to a *simple* buffer (or nullptr for fully-dynamic operation) which
>>>> also
>>>> >> won't duplicate any information the buffer is dumb and has no state.
>>>> >>
>>>> >> That solution sounds simple, efficient and safe to use. The
>>>> implementation
>>>> >> would be actually simpler then raw_svector_ostream since all the
>>>> >> coordination logic is not required anymore.
>>>> >>
>>>> >>
>>>> >> 2015-04-20 22:17 GMT+03:00 Sean Silva <chisophugis at gmail.com>:
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> On Sun, Apr 19, 2015 at 7:40 AM, Yaron Keren <yaron.keren at gmail.com
>>>> >
>>>> >>> wrote:
>>>> >>>>
>>>> >>>> A very common code pattern in LLVM is
>>>> >>>>
>>>> >>>> SmallString<128> S;
>>>> >>>> raw_svector_ostream OS(S);
>>>> >>>> OS<< ...
>>>> >>>> Use OS.str()
>>>> >>>>
>>>> >>>> While raw_svector_ostream is smart to share the text buffer
>>>> itself, it's
>>>> >>>> inefficient keeping two sets of pointers to the same buffer:
>>>> >>>>
>>>> >>>> In SmallString: void *BeginX, *EndX, *CapacityX
>>>> >>>> In raw_ostream: char *OutBufStart, *OutBufEnd, *OutBufCur
>>>> >>>>
>>>> >>>> Moreover, at runtime the two sets of pointers need to be
>>>> coordinated
>>>> >>>> between the SmallString and raw_svector_ostream using
>>>> >>>> raw_svector_ostream::init, raw_svector_ostream::pwrite,
>>>> >>>> raw_svector_ostream::resync and raw_svector_ostream::write_impl.
>>>> >>>> All these functions have non-inlined implementations in
>>>> raw_ostream.cpp.
>>>> >>>>
>>>> >>>> Finally, this may cause subtle bugs if S is modified without
>>>> calling
>>>> >>>> OS::resync(). This is too easy to do by mistake.
>>>> >>>>
>>>> >>>> In this frequent case usage the client does not really care about S
>>>> >>>> being a SmallString with its many useful string helper function.
>>>> It's just
>>>> >>>> boilerplate code for raw_svector_ostream. But it does cost three
>>>> extra
>>>> >>>> pointers, some runtime performance and possible bugs.
>>>> >>>
>>>> >>>
>>>> >>> I agree the bugs are real (Alp proposed something a while back
>>>> regarding
>>>> >>> this?), but you will need to provide measurements to justify the
>>>> cost in
>>>> >>> runtime performance. One technique I have used in the past to
>>>> measure these
>>>> >>> sorts of things I call "stuffing": take the operation that you want
>>>> to
>>>> >>> measure, then essentially change the logic so that you pay the cost
>>>> 2 times,
>>>> >>> 3 times, etc. You can then look at the trend in performance as N
>>>> varies and
>>>> >>> extrapolate back to the case where N = 0 (i.e. you don't pay the
>>>> cost).
>>>> >>>
>>>> >>> For example, in one situation where I used this method it was to
>>>> measure
>>>> >>> the cost of stat'ing files (sys::fs::status) across a holistic
>>>> build, using
>>>> >>> only "time" on the command line (it was on Windows and I didn't
>>>> have any
>>>> >>> tools like DTrace available that can directly measure this). In
>>>> order to do
>>>> >>> this, I changed sys::fs::status to call stat N times instead of 1,
>>>> and
>>>> >>> measured with N=1 N=2 N=3 etc. The result was that the difference
>>>> between
>>>> >>> the N and N+1 versions was about 1-2% across N=1..10 (or whatever I
>>>> >>> measured). In order to negate caching and other confounding
>>>> effects, it is
>>>> >>> important to try different distributions of stats; e.g. the extra
>>>> stats are
>>>> >>> on the same file as the "real" stat vs. the extra stats are on
>>>> nonexistent
>>>> >>> files in the same directory as the "real" file vs. parent
>>>> directories of the
>>>> >>> "real" file; if these match up fairly well (they did), then you
>>>> have some
>>>> >>> indication that the "stuffing" is measuring what you want to
>>>> measure.
>>>> >>>
>>>> >>> So e.g. if you think the cost of 3 extra pointers is significant,
>>>> then
>>>> >>> "stuff" the struct with 3, 6, 9, ... extra pointers and measure the
>>>> >>> difference in performance (e.g. measure the time building a real
>>>> project).
>>>> >>>
>>>> >>> -- Sean Silva
>>>> >>>
>>>> >>>>
>>>> >>>>
>>>> >>>> To solve all three issues, would it make sense to have
>>>> >>>> raw_ostream-derived container with a its own SmallString like
>>>> templated-size
>>>> >>>> built-in buffer?
>>>> >>>>
>>>> >>>
>>>> >
>>>>
>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150502/f543d9cb/attachment.html>
-------------- next part --------------
Index: include/llvm/Support/raw_ostream.h
===================================================================
--- include/llvm/Support/raw_ostream.h (revision 236389)
+++ include/llvm/Support/raw_ostream.h (working copy)
@@ -208,8 +208,8 @@
/// satisfy std::isprint into an escape sequence.
raw_ostream &write_escaped(StringRef Str, bool UseHexEscapes = false);
- raw_ostream &write(unsigned char C);
- raw_ostream &write(const char *Ptr, size_t Size);
+ virtual raw_ostream &write(unsigned char C);
+ virtual raw_ostream &write(const char *Ptr, size_t Size);
// Formatted output, see the format() function in Support/Format.h.
raw_ostream &operator<<(const format_object_base &Fmt);
@@ -483,40 +483,42 @@
/// A raw_ostream that writes to an SmallVector or SmallString. This is a
/// simple adaptor class. This class does not encounter output errors.
+/// raw_svector_ostream operates without a buffer, delegating all memory
+/// management to the SmallString. Thus the SmallString is always up-to-date,
+/// may be used directly and there is no need to call flush().
class raw_svector_ostream : public raw_pwrite_stream {
SmallVectorImpl<char> &OS;
/// See raw_ostream::write_impl.
- void write_impl(const char *Ptr, size_t Size) override;
+ void write_impl(const char *Ptr, size_t Size) override{};
+ void pwrite_impl(const char *Ptr, size_t Size, uint64_t Offset) override {
+ memcpy(OS.data() + Offset, Ptr, Size);
+ }
- void pwrite_impl(const char *Ptr, size_t Size, uint64_t Offset) override;
-
/// Return the current position within the stream, not counting the bytes
/// currently in the buffer.
- uint64_t current_pos() const override;
+ uint64_t current_pos() const override { return OS.size(); }
-protected:
- // Like the regular constructor, but doesn't call init.
- explicit raw_svector_ostream(SmallVectorImpl<char> &O, unsigned);
- void init();
-
public:
/// Construct a new raw_svector_ostream.
///
- /// \param O The vector to write to; this should generally have at least 128
- /// bytes free to avoid any extraneous memory overhead.
- explicit raw_svector_ostream(SmallVectorImpl<char> &O);
- ~raw_svector_ostream() override;
+ explicit raw_svector_ostream(SmallVectorImpl<char> &O) : OS(O) {}
+ ~raw_svector_ostream() override {}
+ raw_ostream &write(unsigned char C) override {
+ OS.push_back(C);
+ return *this;
+ }
+ raw_ostream &write(const char *Ptr, size_t Size) override {
+ OS.append(Ptr, Ptr + Size);
+ return *this;
+ }
- /// This is called when the SmallVector we're appending to is changed outside
- /// of the raw_svector_ostream's control. It is only safe to do this if the
- /// raw_svector_ostream has previously been flushed.
- void resync();
+ // FIXME: remove resync() and users.
+ void resync() {}
- /// Flushes the stream contents to the target vector and return a StringRef
- /// for the vector contents.
- StringRef str();
+ /// Return a StringRef for the vector contents.
+ StringRef str() { return StringRef(OS.data(), OS.size()); }
};
/// A raw_ostream that discards all output.
@@ -539,9 +541,7 @@
SmallVector<char, 0> Buffer;
public:
- buffer_ostream(raw_ostream &OS) : raw_svector_ostream(Buffer, 0), OS(OS) {
- init();
- }
+ buffer_ostream(raw_ostream &OS) : raw_svector_ostream(Buffer), OS(OS) {}
~buffer_ostream() { OS << str(); }
};
Index: lib/Support/raw_ostream.cpp
===================================================================
--- lib/Support/raw_ostream.cpp (revision 236389)
+++ lib/Support/raw_ostream.cpp (working copy)
@@ -752,78 +752,6 @@
}
//===----------------------------------------------------------------------===//
-// raw_svector_ostream
-//===----------------------------------------------------------------------===//
-
-// The raw_svector_ostream implementation uses the SmallVector itself as the
-// buffer for the raw_ostream. We guarantee that the raw_ostream buffer is
-// always pointing past the end of the vector, but within the vector
-// capacity. This allows raw_ostream to write directly into the correct place,
-// and we only need to set the vector size when the data is flushed.
-
-raw_svector_ostream::raw_svector_ostream(SmallVectorImpl<char> &O, unsigned)
- : OS(O) {}
-
-raw_svector_ostream::raw_svector_ostream(SmallVectorImpl<char> &O) : OS(O) {
- init();
-}
-
-void raw_svector_ostream::init() {
- // Set up the initial external buffer. We make sure that the buffer has at
- // least 128 bytes free; raw_ostream itself only requires 64, but we want to
- // make sure that we don't grow the buffer unnecessarily on destruction (when
- // the data is flushed). See the FIXME below.
- OS.reserve(OS.size() + 128);
- SetBuffer(OS.end(), OS.capacity() - OS.size());
-}
-
-raw_svector_ostream::~raw_svector_ostream() {
- // FIXME: Prevent resizing during this flush().
- flush();
-}
-
-void raw_svector_ostream::pwrite_impl(const char *Ptr, size_t Size,
- uint64_t Offset) {
- flush();
- memcpy(OS.begin() + Offset, Ptr, Size);
-}
-
-/// resync - This is called when the SmallVector we're appending to is changed
-/// outside of the raw_svector_ostream's control. It is only safe to do this
-/// if the raw_svector_ostream has previously been flushed.
-void raw_svector_ostream::resync() {
- assert(GetNumBytesInBuffer() == 0 && "Didn't flush before mutating vector");
-
- if (OS.capacity() - OS.size() < 64)
- OS.reserve(OS.capacity() * 2);
- SetBuffer(OS.end(), OS.capacity() - OS.size());
-}
-
-void raw_svector_ostream::write_impl(const char *Ptr, size_t Size) {
- if (Ptr == OS.end()) {
- // Grow the buffer to include the scratch area without copying.
- size_t NewSize = OS.size() + Size;
- assert(NewSize <= OS.capacity() && "Invalid write_impl() call!");
- OS.set_size(NewSize);
- } else {
- assert(!GetNumBytesInBuffer());
- OS.append(Ptr, Ptr + Size);
- }
-
- OS.reserve(OS.size() + 64);
- SetBuffer(OS.end(), OS.capacity() - OS.size());
-}
-
-uint64_t raw_svector_ostream::current_pos() const {
- return OS.size();
-}
-
-StringRef raw_svector_ostream::str() {
- flush();
- return StringRef(OS.begin(), OS.size());
-}
-
-//===----------------------------------------------------------------------===//
// raw_null_ostream
//===----------------------------------------------------------------------===//
More information about the llvm-dev
mailing list