[LLVMdev] SmallString + raw_svector_ostream combination should be more efficient

Sat May 2 11:31:54 PDT 2015

Following a hint from Duncan in http://llvm.org/pr23395, here is a revised
patch. Rather then introduce the raw_char_ostream adapter, this version
improves raw_svector_ostream.

raw_svector_ostream is now a very lightweight adapter, running without a
buffer and fowarding almost anything to the SmallString. This solves both
the performance issue and the information duplication issue. The
SmallString is always up-to-date and can be used at any time. flush() does
nothing and is not required. resync() was kept in this patch as a no-op and
will be removed with its users in a follow-up patch. I'll also try to
remove the superfluous flush calls() along the way.



2015-05-02 7:41 GMT+03:00 Yaron Keren <yaron.keren at gmail.com>:

> +update diff
>
>
> 2015-05-02 7:38 GMT+03:00 Yaron Keren <yaron.keren at gmail.com>:
>
>> I outlined (is that the right
>> word?) raw_char_ostream::grow, raw_char_ostream::write (both) into
>> raw_ostream.cpp with less than 10% difference in performance.
>>
>> Profiling reveals that the real culprit is the code line
>>
>>  OS.reserve(OS.size() + 64);
>>
>> called from raw_svector_ostream::write_impl called from
>> raw_svector_ostream::~raw_svector_ostream.
>> The issue is already documented:
>>
>> raw_svector_ostream::~raw_svector_ostream() {
>>   // FIXME: Prevent resizing during this flush().
>>   flush();
>> }
>>
>> And a solution is provided in raw_svector_ostream::init():
>>
>>   // Set up the initial external buffer. We make sure that the buffer has
>> at
>>   // least 128 bytes free; raw_ostream itself only requires 64, but we
>> want to
>>   // make sure that we don't grow the buffer unnecessarily on destruction
>> (when
>>   // the data is flushed). See the FIXME below.
>>   OS.reserve(OS.size() + 128);
>>
>> This solution may be worse than the problem. In total:
>>
>> * If the SmallString was less than 128 bytes init() will always(!) heap
>> allocate.
>> * If it's more or equal to 128 bytes but upon destruction less than 64
>> bytes are left unused it will heap allocate for no reason.
>>
>> A careful programmer who did size the SmallString to match its use +
>> small reserve will find either of these heap allocations surprising,
>> negating the SmallString advantage and then paying memcpy cost.
>>
>> I filed http://llvm.org/pr23395 about this. To temporarly avoid this
>> issue, in addition to the outline methods change I commented the OS.reserve
>> line in flush(). (This change of course requires further review.)
>>
>> With reserve being out of the way, the gap now is smaller but still
>> significant: raw_char_ostream is about 20% faster than
>> raw_svector_ostream in Release=optimized configuration.
>>
>>
>> 2015-05-02 4:39 GMT+03:00 Sean Silva <chisophugis at gmail.com>:
>>
>>>
>>> Could you dig into why:
>>> +  raw_ostream &write(unsigned char C) override {
>>> +    grow(1);
>>> +    *OutBufCur++ = C;
>>> +    return *this;
>>> +  }
>>>
>>> Is 3 times as fast as raw_svector_ostream? I don't see a good reason why
>>> that should be any faster than:
>>>
>>>   raw_ostream &operator<<(char C) {
>>>     if (OutBufCur >= OutBufEnd)
>>>       return write(C);
>>>     *OutBufCur++ = C;
>>>     return *this;
>>>   }
>>>
>>>
>>> You might just be seeing the difference between having write inlined vs.
>>> non-inlined, in which case your patch might be a complicated way to pull
>>> the likely cases of some raw_ostream methods into the header.
>>>
>>>
>>> -- Sean Silva
>>>
>>
>> On Thu, Apr 30, 2015 at 8:02 PM, Yaron Keren <yaron.keren at gmail.com>
>> wrote:
>>
>>> Yes, we should do without virtual flush. The problem
>>> was raw_char_ostream should not be flushed - it's always current - so I
>>> overloaded flush to a no-op. A cleaner solution is attached, adding
>>> a DirectBuffer mode to the raw_ostream.
>>>
>>> Also added a simple performance comparison project
>>> between raw_char_ostream and raw_svector_ostream. On Window 7 x64
>>> machine, raw_char_ostream was three times faster than raw_svector_ostream
>>> when the provided buffer size is large enough and two times as fast with
>>> one dynamic allocation resize.
>>>
>>>
>>> 2015-04-30 22:48 GMT+03:00 Rafael Espíndola <rafael.espindola at gmail.com>
>>> :
>>>
>>>> I don't think we should make flush virtual. Why do you need to do it?
>>>> Can't you set up the base class to write to use the tail of the memory
>>>> region as the buffer?
>>>>
>>>>
>>>>
>>>> On 24 April 2015 at 06:46, Yaron Keren <yaron.keren at gmail.com> wrote:
>>>> > Hi,
>>>> >
>>>> > Is this what you're thinking about?
>>>> > The code is not tested yet, I'd like to know if the overall direction
>>>> and
>>>> > design are correct.
>>>> >
>>>> > Yaron
>>>> >
>>>> >
>>>> > 2015-04-21 0:10 GMT+03:00 Yaron Keren <yaron.keren at gmail.com>:
>>>> >>
>>>> >> Sean, thanks for reminding this, Alp did commit a class derived from
>>>> >> raw_svector_ostream conatining an internal SmallString he called
>>>> >> small_string_ostream. The commit was reverted after a day due to a
>>>> >> disagreement about the commit approval and apparently abandoned.
>>>> >>
>>>> >>
>>>> >>
>>>> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223393.html
>>>> >>
>>>> >>
>>>> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223557.html
>>>> >>
>>>> >>
>>>> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140630/223986.html
>>>> >>
>>>> >> The class Alp comitted did solve the possible mismatch between the
>>>> >> SmallString and the stream by making the SmallString private to the
>>>> class.
>>>> >> This however did not treat the root problem, the duplication of the
>>>> >> information about the buffer between SmallString and the stream.
>>>> >>
>>>> >> I can make performance measurements but even if the performance
>>>> difference
>>>> >> is neglible (and looking at all the code paths and conditionals of
>>>> >> non-inlined functions at raw_ostream.cpp that's hard to believe), the
>>>> >> current SmallString-raw_svector_ostream is simply wrong.
>>>> >>
>>>> >> Personally, after the previous "Alp" discussion I found and fixed
>>>> several
>>>> >> such issues in my out-of-tree code only to make new similar new error
>>>> >> earlier this year, which I caught only months after, when Rafael
>>>> committed
>>>> >> the pwrite which reminded me the issue. Due ot the information
>>>> duplication
>>>> >> Rafael commit also had a bug and Mehdi reports similar experience.
>>>> Back then
>>>> >> Alp reported similar problems he found in LLVM code which were
>>>> hopefully
>>>> >> fixed otherwise.
>>>> >>
>>>> >> In addition to the information duplication, the design is quite
>>>> confusing
>>>> >> to use
>>>> >>
>>>> >> - Should one use the stream.str() function or the SmallString itself?
>>>> >> - flush or str?
>>>> >> - How do you properly clear the combination for reuse (that was my
>>>> >> mistake, I forgot to resync after OS.clear())?
>>>> >>
>>>> >> It's safe to say this design pattern is very easy to get wrong one
>>>> way or
>>>> >> another, we got burned by it multiple times and it should be
>>>> replaced.
>>>> >>
>>>> >> Alp suggested a derived class containing its own SmallString. That's
>>>> a
>>>> >> simple and effective approach to avoid the bugs mentioned but does
>>>> not fix
>>>> >> the memory or runtime issues. Small as they may be, why have them at
>>>> a
>>>> >> fundemental data structure?
>>>> >>
>>>> >> I was thinking about going further avoiding all duplication with a
>>>> >> templated class derived with its own internal buffer storage.
>>>> >>
>>>> >> Rafael suggested a more modular approach, a derived adpater class
>>>> adapter
>>>> >> to a *simple* buffer (or nullptr for fully-dynamic operation) which
>>>> also
>>>> >> won't duplicate any information the buffer is dumb and has no state.
>>>> >>
>>>> >> That solution sounds simple, efficient and safe to use. The
>>>> implementation
>>>> >> would be actually simpler then raw_svector_ostream since all the
>>>> >> coordination logic is not required anymore.
>>>> >>
>>>> >>
>>>> >> 2015-04-20 22:17 GMT+03:00 Sean Silva <chisophugis at gmail.com>:
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> On Sun, Apr 19, 2015 at 7:40 AM, Yaron Keren <yaron.keren at gmail.com
>>>> >
>>>> >>> wrote:
>>>> >>>>
>>>> >>>> A very common code pattern in LLVM is
>>>> >>>>
>>>> >>>>  SmallString<128> S;
>>>> >>>>  raw_svector_ostream OS(S);
>>>> >>>>  OS<< ...
>>>> >>>>  Use OS.str()
>>>> >>>>
>>>> >>>> While raw_svector_ostream is smart to share the text buffer
>>>> itself, it's
>>>> >>>> inefficient keeping two sets of pointers to the same buffer:
>>>> >>>>
>>>> >>>>  In SmallString: void *BeginX, *EndX, *CapacityX
>>>> >>>>  In raw_ostream: char *OutBufStart, *OutBufEnd, *OutBufCur
>>>> >>>>
>>>> >>>> Moreover, at runtime the two sets of pointers need to be
>>>> coordinated
>>>> >>>> between the SmallString and raw_svector_ostream using
>>>> >>>> raw_svector_ostream::init, raw_svector_ostream::pwrite,
>>>> >>>> raw_svector_ostream::resync and raw_svector_ostream::write_impl.
>>>> >>>> All these functions have non-inlined implementations in
>>>> raw_ostream.cpp.
>>>> >>>>
>>>> >>>> Finally, this may cause subtle bugs if S is modified without
>>>> calling
>>>> >>>> OS::resync(). This is too easy to do by mistake.
>>>> >>>>
>>>> >>>> In this frequent case usage the client does not really care about S
>>>> >>>> being a SmallString with its many useful string helper function.
>>>> It's just
>>>> >>>> boilerplate code for raw_svector_ostream. But it does cost three
>>>> extra
>>>> >>>> pointers, some runtime performance and possible bugs.
>>>> >>>
>>>> >>>
>>>> >>> I agree the bugs are real (Alp proposed something a while back
>>>> regarding
>>>> >>> this?), but you will need to provide measurements to justify the
>>>> cost in
>>>> >>> runtime performance. One technique I have used in the past to
>>>> measure these
>>>> >>> sorts of things I call "stuffing": take the operation that you want
>>>> to
>>>> >>> measure, then essentially change the logic so that you pay the cost
>>>> 2 times,
>>>> >>> 3 times, etc. You can then look at the trend in performance as N
>>>> varies and
>>>> >>> extrapolate back to the case where N = 0 (i.e. you don't pay the
>>>> cost).
>>>> >>>
>>>> >>> For example, in one situation where I used this method it was to
>>>> measure
>>>> >>> the cost of stat'ing files (sys::fs::status) across a holistic
>>>> build, using
>>>> >>> only "time" on the command line (it was on Windows and I didn't
>>>> have any
>>>> >>> tools like DTrace available that can directly measure this). In
>>>> order to do
>>>> >>> this, I changed sys::fs::status to call stat N times instead of 1,
>>>> and
>>>> >>> measured with N=1 N=2 N=3 etc. The result was that the difference
>>>> between
>>>> >>> the N and N+1 versions was about 1-2% across N=1..10 (or whatever I
>>>> >>> measured). In order to negate caching and other confounding
>>>> effects, it is
>>>> >>> important to try different distributions of stats; e.g. the extra
>>>> stats are
>>>> >>> on the same file as the "real" stat vs. the extra stats are on
>>>> nonexistent
>>>> >>> files in the same directory as the "real" file vs. parent
>>>> directories of the
>>>> >>> "real" file; if these match up fairly well (they did), then you
>>>> have some
>>>> >>> indication that the "stuffing" is measuring what you want to
>>>> measure.
>>>> >>>
>>>> >>> So e.g. if you think the cost of 3 extra pointers is significant,
>>>> then
>>>> >>> "stuff" the struct with 3, 6, 9, ... extra pointers and measure the
>>>> >>> difference in performance (e.g. measure the time building a real
>>>> project).
>>>> >>>
>>>> >>> -- Sean Silva
>>>> >>>
>>>> >>>>
>>>> >>>>
>>>> >>>> To solve all three issues, would it make sense to have
>>>> >>>> raw_ostream-derived container with a its own SmallString like
>>>> templated-size
>>>> >>>> built-in buffer?
>>>> >>>>
>>>> >>>
>>>> >
>>>>
>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150502/f543d9cb/attachment.html>
-------------- next part --------------
Index: include/llvm/Support/raw_ostream.h
===================================================================

--- include/llvm/Support/raw_ostream.h	(revision 236389)
+++ include/llvm/Support/raw_ostream.h	(working copy)
@@ -208,8 +208,8 @@
   /// satisfy std::isprint into an escape sequence.
   raw_ostream &write_escaped(StringRef Str, bool UseHexEscapes = false);
 
-  raw_ostream &write(unsigned char C);
-  raw_ostream &write(const char *Ptr, size_t Size);
+  virtual raw_ostream &write(unsigned char C);
+  virtual raw_ostream &write(const char *Ptr, size_t Size);
 
   // Formatted output, see the format() function in Support/Format.h.
   raw_ostream &operator<<(const format_object_base &Fmt);
@@ -483,40 +483,42 @@
 
 /// A raw_ostream that writes to an SmallVector or SmallString.  This is a
 /// simple adaptor class. This class does not encounter output errors.
+/// raw_svector_ostream operates without a buffer, delegating all memory
+/// management to the SmallString. Thus the SmallString is always up-to-date,
+/// may be used directly and there is no need to call flush().
 class raw_svector_ostream : public raw_pwrite_stream {
   SmallVectorImpl<char> &OS;
 
   /// See raw_ostream::write_impl.
-  void write_impl(const char *Ptr, size_t Size) override;
+  void write_impl(const char *Ptr, size_t Size) override{};
+  void pwrite_impl(const char *Ptr, size_t Size, uint64_t Offset) override {
+    memcpy(OS.data() + Offset, Ptr, Size);
+  }
 
-  void pwrite_impl(const char *Ptr, size_t Size, uint64_t Offset) override;
-
   /// Return the current position within the stream, not counting the bytes
   /// currently in the buffer.
-  uint64_t current_pos() const override;
+  uint64_t current_pos() const override { return OS.size(); }
 
-protected:
-  // Like the regular constructor, but doesn't call init.
-  explicit raw_svector_ostream(SmallVectorImpl<char> &O, unsigned);
-  void init();
-
 public:
   /// Construct a new raw_svector_ostream.
   ///
-  /// \param O The vector to write to; this should generally have at least 128
-  /// bytes free to avoid any extraneous memory overhead.
-  explicit raw_svector_ostream(SmallVectorImpl<char> &O);
-  ~raw_svector_ostream() override;
+  explicit raw_svector_ostream(SmallVectorImpl<char> &O) : OS(O) {}
+  ~raw_svector_ostream() override {}
 
+  raw_ostream &write(unsigned char C) override {
+    OS.push_back(C);
+    return *this;
+  }
+  raw_ostream &write(const char *Ptr, size_t Size) override {
+    OS.append(Ptr, Ptr + Size);
+    return *this;
+  }
 
-  /// This is called when the SmallVector we're appending to is changed outside
-  /// of the raw_svector_ostream's control.  It is only safe to do this if the
-  /// raw_svector_ostream has previously been flushed.
-  void resync();
+  // FIXME: remove resync() and users.
+  void resync() {}
 
-  /// Flushes the stream contents to the target vector and return a StringRef
-  /// for the vector contents.
-  StringRef str();
+  /// Return a StringRef for the vector contents.
+  StringRef str() { return StringRef(OS.data(), OS.size()); }
 };
 
 /// A raw_ostream that discards all output.
@@ -539,9 +541,7 @@
   SmallVector<char, 0> Buffer;
 
 public:
-  buffer_ostream(raw_ostream &OS) : raw_svector_ostream(Buffer, 0), OS(OS) {
-    init();
-  }
+  buffer_ostream(raw_ostream &OS) : raw_svector_ostream(Buffer), OS(OS) {}
   ~buffer_ostream() { OS << str(); }
 };
 
Index: lib/Support/raw_ostream.cpp
===================================================================
--- lib/Support/raw_ostream.cpp	(revision 236389)
+++ lib/Support/raw_ostream.cpp	(working copy)
@@ -752,78 +752,6 @@
 }
 
 //===----------------------------------------------------------------------===//
-//  raw_svector_ostream
-//===----------------------------------------------------------------------===//
-
-// The raw_svector_ostream implementation uses the SmallVector itself as the
-// buffer for the raw_ostream. We guarantee that the raw_ostream buffer is
-// always pointing past the end of the vector, but within the vector
-// capacity. This allows raw_ostream to write directly into the correct place,
-// and we only need to set the vector size when the data is flushed.
-
-raw_svector_ostream::raw_svector_ostream(SmallVectorImpl<char> &O, unsigned)
-    : OS(O) {}
-
-raw_svector_ostream::raw_svector_ostream(SmallVectorImpl<char> &O) : OS(O) {
-  init();
-}
-
-void raw_svector_ostream::init() {
-  // Set up the initial external buffer. We make sure that the buffer has at
-  // least 128 bytes free; raw_ostream itself only requires 64, but we want to
-  // make sure that we don't grow the buffer unnecessarily on destruction (when
-  // the data is flushed). See the FIXME below.
-  OS.reserve(OS.size() + 128);
-  SetBuffer(OS.end(), OS.capacity() - OS.size());
-}
-
-raw_svector_ostream::~raw_svector_ostream() {
-  // FIXME: Prevent resizing during this flush().
-  flush();
-}
-
-void raw_svector_ostream::pwrite_impl(const char *Ptr, size_t Size,
-                                      uint64_t Offset) {
-  flush();
-  memcpy(OS.begin() + Offset, Ptr, Size);
-}
-
-/// resync - This is called when the SmallVector we're appending to is changed
-/// outside of the raw_svector_ostream's control.  It is only safe to do this
-/// if the raw_svector_ostream has previously been flushed.
-void raw_svector_ostream::resync() {
-  assert(GetNumBytesInBuffer() == 0 && "Didn't flush before mutating vector");
-
-  if (OS.capacity() - OS.size() < 64)
-    OS.reserve(OS.capacity() * 2);
-  SetBuffer(OS.end(), OS.capacity() - OS.size());
-}
-
-void raw_svector_ostream::write_impl(const char *Ptr, size_t Size) {
-  if (Ptr == OS.end()) {
-    // Grow the buffer to include the scratch area without copying.
-    size_t NewSize = OS.size() + Size;
-    assert(NewSize <= OS.capacity() && "Invalid write_impl() call!");
-    OS.set_size(NewSize);
-  } else {
-    assert(!GetNumBytesInBuffer());
-    OS.append(Ptr, Ptr + Size);
-  }
-
-  OS.reserve(OS.size() + 64);
-  SetBuffer(OS.end(), OS.capacity() - OS.size());
-}
-
-uint64_t raw_svector_ostream::current_pos() const {
-   return OS.size();
-}
-
-StringRef raw_svector_ostream::str() {
-  flush();
-  return StringRef(OS.begin(), OS.size());
-}
-
-//===----------------------------------------------------------------------===//
 //  raw_null_ostream
 //===----------------------------------------------------------------------===//