[LLVMdev] SmallString + raw_svector_ostream combination should be more efficient

Yaron Keren yaron.keren at gmail.com
Fri May 22 07:17:07 PDT 2015


Here's a performance testcase for the raw_svector_ostream patch. On my
WIndows x64 machine it runs in 1010ms with the current code and in 440ms
with the patch applies. Is this OK to commit?


2015-05-02 21:31 GMT+03:00 Yaron Keren <yaron.keren at gmail.com>:

> Following a hint from Duncan in http://llvm.org/pr23395, here is a
> revised patch. Rather then introduce the raw_char_ostream adapter, this
> version improves raw_svector_ostream.
>
> raw_svector_ostream is now a very lightweight adapter, running without a
> buffer and fowarding almost anything to the SmallString. This solves both
> the performance issue and the information duplication issue. The
> SmallString is always up-to-date and can be used at any time. flush() does
> nothing and is not required. resync() was kept in this patch as a no-op and
> will be removed with its users in a follow-up patch. I'll also try to
> remove the superfluous flush calls() along the way.
>
>
>
> 2015-05-02 7:41 GMT+03:00 Yaron Keren <yaron.keren at gmail.com>:
>
>> +update diff
>>
>>
>> 2015-05-02 7:38 GMT+03:00 Yaron Keren <yaron.keren at gmail.com>:
>>
>>> I outlined (is that the right
>>> word?) raw_char_ostream::grow, raw_char_ostream::write (both) into
>>> raw_ostream.cpp with less than 10% difference in performance.
>>>
>>> Profiling reveals that the real culprit is the code line
>>>
>>>  OS.reserve(OS.size() + 64);
>>>
>>> called from raw_svector_ostream::write_impl called from
>>> raw_svector_ostream::~raw_svector_ostream.
>>> The issue is already documented:
>>>
>>> raw_svector_ostream::~raw_svector_ostream() {
>>>   // FIXME: Prevent resizing during this flush().
>>>   flush();
>>> }
>>>
>>> And a solution is provided in raw_svector_ostream::init():
>>>
>>>   // Set up the initial external buffer. We make sure that the buffer
>>> has at
>>>   // least 128 bytes free; raw_ostream itself only requires 64, but we
>>> want to
>>>   // make sure that we don't grow the buffer unnecessarily on
>>> destruction (when
>>>   // the data is flushed). See the FIXME below.
>>>   OS.reserve(OS.size() + 128);
>>>
>>> This solution may be worse than the problem. In total:
>>>
>>> * If the SmallString was less than 128 bytes init() will always(!) heap
>>> allocate.
>>> * If it's more or equal to 128 bytes but upon destruction less than 64
>>> bytes are left unused it will heap allocate for no reason.
>>>
>>> A careful programmer who did size the SmallString to match its use +
>>> small reserve will find either of these heap allocations surprising,
>>> negating the SmallString advantage and then paying memcpy cost.
>>>
>>> I filed http://llvm.org/pr23395 about this. To temporarly avoid this
>>> issue, in addition to the outline methods change I commented the OS.reserve
>>> line in flush(). (This change of course requires further review.)
>>>
>>> With reserve being out of the way, the gap now is smaller but still
>>> significant: raw_char_ostream is about 20% faster than
>>> raw_svector_ostream in Release=optimized configuration.
>>>
>>>
>>> 2015-05-02 4:39 GMT+03:00 Sean Silva <chisophugis at gmail.com>:
>>>
>>>>
>>>> Could you dig into why:
>>>> +  raw_ostream &write(unsigned char C) override {
>>>> +    grow(1);
>>>> +    *OutBufCur++ = C;
>>>> +    return *this;
>>>> +  }
>>>>
>>>> Is 3 times as fast as raw_svector_ostream? I don't see a good reason
>>>> why that should be any faster than:
>>>>
>>>>   raw_ostream &operator<<(char C) {
>>>>     if (OutBufCur >= OutBufEnd)
>>>>       return write(C);
>>>>     *OutBufCur++ = C;
>>>>     return *this;
>>>>   }
>>>>
>>>>
>>>> You might just be seeing the difference between having write inlined
>>>> vs. non-inlined, in which case your patch might be a complicated way to
>>>> pull the likely cases of some raw_ostream methods into the header.
>>>>
>>>>
>>>> -- Sean Silva
>>>>
>>>
>>> On Thu, Apr 30, 2015 at 8:02 PM, Yaron Keren <yaron.keren at gmail.com>
>>> wrote:
>>>
>>>> Yes, we should do without virtual flush. The problem
>>>> was raw_char_ostream should not be flushed - it's always current - so I
>>>> overloaded flush to a no-op. A cleaner solution is attached, adding
>>>> a DirectBuffer mode to the raw_ostream.
>>>>
>>>> Also added a simple performance comparison project
>>>> between raw_char_ostream and raw_svector_ostream. On Window 7 x64
>>>> machine, raw_char_ostream was three times faster than raw_svector_ostream
>>>> when the provided buffer size is large enough and two times as fast with
>>>> one dynamic allocation resize.
>>>>
>>>>
>>>> 2015-04-30 22:48 GMT+03:00 Rafael EspĂ­ndola <rafael.espindola at gmail.com
>>>> >:
>>>>
>>>>> I don't think we should make flush virtual. Why do you need to do it?
>>>>> Can't you set up the base class to write to use the tail of the memory
>>>>> region as the buffer?
>>>>>
>>>>>
>>>>>
>>>>> On 24 April 2015 at 06:46, Yaron Keren <yaron.keren at gmail.com> wrote:
>>>>> > Hi,
>>>>> >
>>>>> > Is this what you're thinking about?
>>>>> > The code is not tested yet, I'd like to know if the overall
>>>>> direction and
>>>>> > design are correct.
>>>>> >
>>>>> > Yaron
>>>>> >
>>>>> >
>>>>> > 2015-04-21 0:10 GMT+03:00 Yaron Keren <yaron.keren at gmail.com>:
>>>>> >>
>>>>> >> Sean, thanks for reminding this, Alp did commit a class derived from
>>>>> >> raw_svector_ostream conatining an internal SmallString he called
>>>>> >> small_string_ostream. The commit was reverted after a day due to a
>>>>> >> disagreement about the commit approval and apparently abandoned.
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223393.html
>>>>> >>
>>>>> >>
>>>>> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223557.html
>>>>> >>
>>>>> >>
>>>>> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140630/223986.html
>>>>> >>
>>>>> >> The class Alp comitted did solve the possible mismatch between the
>>>>> >> SmallString and the stream by making the SmallString private to the
>>>>> class.
>>>>> >> This however did not treat the root problem, the duplication of the
>>>>> >> information about the buffer between SmallString and the stream.
>>>>> >>
>>>>> >> I can make performance measurements but even if the performance
>>>>> difference
>>>>> >> is neglible (and looking at all the code paths and conditionals of
>>>>> >> non-inlined functions at raw_ostream.cpp that's hard to believe),
>>>>> the
>>>>> >> current SmallString-raw_svector_ostream is simply wrong.
>>>>> >>
>>>>> >> Personally, after the previous "Alp" discussion I found and fixed
>>>>> several
>>>>> >> such issues in my out-of-tree code only to make new similar new
>>>>> error
>>>>> >> earlier this year, which I caught only months after, when Rafael
>>>>> committed
>>>>> >> the pwrite which reminded me the issue. Due ot the information
>>>>> duplication
>>>>> >> Rafael commit also had a bug and Mehdi reports similar experience.
>>>>> Back then
>>>>> >> Alp reported similar problems he found in LLVM code which were
>>>>> hopefully
>>>>> >> fixed otherwise.
>>>>> >>
>>>>> >> In addition to the information duplication, the design is quite
>>>>> confusing
>>>>> >> to use
>>>>> >>
>>>>> >> - Should one use the stream.str() function or the SmallString
>>>>> itself?
>>>>> >> - flush or str?
>>>>> >> - How do you properly clear the combination for reuse (that was my
>>>>> >> mistake, I forgot to resync after OS.clear())?
>>>>> >>
>>>>> >> It's safe to say this design pattern is very easy to get wrong one
>>>>> way or
>>>>> >> another, we got burned by it multiple times and it should be
>>>>> replaced.
>>>>> >>
>>>>> >> Alp suggested a derived class containing its own SmallString.
>>>>> That's a
>>>>> >> simple and effective approach to avoid the bugs mentioned but does
>>>>> not fix
>>>>> >> the memory or runtime issues. Small as they may be, why have them
>>>>> at a
>>>>> >> fundemental data structure?
>>>>> >>
>>>>> >> I was thinking about going further avoiding all duplication with a
>>>>> >> templated class derived with its own internal buffer storage.
>>>>> >>
>>>>> >> Rafael suggested a more modular approach, a derived adpater class
>>>>> adapter
>>>>> >> to a *simple* buffer (or nullptr for fully-dynamic operation) which
>>>>> also
>>>>> >> won't duplicate any information the buffer is dumb and has no state.
>>>>> >>
>>>>> >> That solution sounds simple, efficient and safe to use. The
>>>>> implementation
>>>>> >> would be actually simpler then raw_svector_ostream since all the
>>>>> >> coordination logic is not required anymore.
>>>>> >>
>>>>> >>
>>>>> >> 2015-04-20 22:17 GMT+03:00 Sean Silva <chisophugis at gmail.com>:
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> On Sun, Apr 19, 2015 at 7:40 AM, Yaron Keren <
>>>>> yaron.keren at gmail.com>
>>>>> >>> wrote:
>>>>> >>>>
>>>>> >>>> A very common code pattern in LLVM is
>>>>> >>>>
>>>>> >>>>  SmallString<128> S;
>>>>> >>>>  raw_svector_ostream OS(S);
>>>>> >>>>  OS<< ...
>>>>> >>>>  Use OS.str()
>>>>> >>>>
>>>>> >>>> While raw_svector_ostream is smart to share the text buffer
>>>>> itself, it's
>>>>> >>>> inefficient keeping two sets of pointers to the same buffer:
>>>>> >>>>
>>>>> >>>>  In SmallString: void *BeginX, *EndX, *CapacityX
>>>>> >>>>  In raw_ostream: char *OutBufStart, *OutBufEnd, *OutBufCur
>>>>> >>>>
>>>>> >>>> Moreover, at runtime the two sets of pointers need to be
>>>>> coordinated
>>>>> >>>> between the SmallString and raw_svector_ostream using
>>>>> >>>> raw_svector_ostream::init, raw_svector_ostream::pwrite,
>>>>> >>>> raw_svector_ostream::resync and raw_svector_ostream::write_impl.
>>>>> >>>> All these functions have non-inlined implementations in
>>>>> raw_ostream.cpp.
>>>>> >>>>
>>>>> >>>> Finally, this may cause subtle bugs if S is modified without
>>>>> calling
>>>>> >>>> OS::resync(). This is too easy to do by mistake.
>>>>> >>>>
>>>>> >>>> In this frequent case usage the client does not really care about
>>>>> S
>>>>> >>>> being a SmallString with its many useful string helper function.
>>>>> It's just
>>>>> >>>> boilerplate code for raw_svector_ostream. But it does cost three
>>>>> extra
>>>>> >>>> pointers, some runtime performance and possible bugs.
>>>>> >>>
>>>>> >>>
>>>>> >>> I agree the bugs are real (Alp proposed something a while back
>>>>> regarding
>>>>> >>> this?), but you will need to provide measurements to justify the
>>>>> cost in
>>>>> >>> runtime performance. One technique I have used in the past to
>>>>> measure these
>>>>> >>> sorts of things I call "stuffing": take the operation that you
>>>>> want to
>>>>> >>> measure, then essentially change the logic so that you pay the
>>>>> cost 2 times,
>>>>> >>> 3 times, etc. You can then look at the trend in performance as N
>>>>> varies and
>>>>> >>> extrapolate back to the case where N = 0 (i.e. you don't pay the
>>>>> cost).
>>>>> >>>
>>>>> >>> For example, in one situation where I used this method it was to
>>>>> measure
>>>>> >>> the cost of stat'ing files (sys::fs::status) across a holistic
>>>>> build, using
>>>>> >>> only "time" on the command line (it was on Windows and I didn't
>>>>> have any
>>>>> >>> tools like DTrace available that can directly measure this). In
>>>>> order to do
>>>>> >>> this, I changed sys::fs::status to call stat N times instead of 1,
>>>>> and
>>>>> >>> measured with N=1 N=2 N=3 etc. The result was that the difference
>>>>> between
>>>>> >>> the N and N+1 versions was about 1-2% across N=1..10 (or whatever I
>>>>> >>> measured). In order to negate caching and other confounding
>>>>> effects, it is
>>>>> >>> important to try different distributions of stats; e.g. the extra
>>>>> stats are
>>>>> >>> on the same file as the "real" stat vs. the extra stats are on
>>>>> nonexistent
>>>>> >>> files in the same directory as the "real" file vs. parent
>>>>> directories of the
>>>>> >>> "real" file; if these match up fairly well (they did), then you
>>>>> have some
>>>>> >>> indication that the "stuffing" is measuring what you want to
>>>>> measure.
>>>>> >>>
>>>>> >>> So e.g. if you think the cost of 3 extra pointers is significant,
>>>>> then
>>>>> >>> "stuff" the struct with 3, 6, 9, ... extra pointers and measure the
>>>>> >>> difference in performance (e.g. measure the time building a real
>>>>> project).
>>>>> >>>
>>>>> >>> -- Sean Silva
>>>>> >>>
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> To solve all three issues, would it make sense to have
>>>>> >>>> raw_ostream-derived container with a its own SmallString like
>>>>> templated-size
>>>>> >>>> built-in buffer?
>>>>> >>>>
>>>>> >>>
>>>>> >
>>>>>
>>>>
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150522/ee08391a/attachment.html>
-------------- next part --------------
set(LLVM_LINK_COMPONENTS
  Support
)

add_llvm_tool(raw_svector_ostream_speed
  raw_svector_ostream_speed.cpp
)
-------------- next part --------------
Index: include/llvm/Support/raw_ostream.h
===================================================================
--- include/llvm/Support/raw_ostream.h	(revision 237261)
+++ include/llvm/Support/raw_ostream.h	(working copy)
@@ -208,8 +208,8 @@
   /// satisfy std::isprint into an escape sequence.
   raw_ostream &write_escaped(StringRef Str, bool UseHexEscapes = false);
 
-  raw_ostream &write(unsigned char C);
-  raw_ostream &write(const char *Ptr, size_t Size);
+  virtual raw_ostream &write(unsigned char C);
+  virtual raw_ostream &write(const char *Ptr, size_t Size);
 
   // Formatted output, see the format() function in Support/Format.h.
   raw_ostream &operator<<(const format_object_base &Fmt);
@@ -483,40 +483,42 @@
 
 /// A raw_ostream that writes to an SmallVector or SmallString.  This is a
 /// simple adaptor class. This class does not encounter output errors.
+/// raw_svector_ostream operates without a buffer, delegating all memory
+/// management to the SmallString. Thus the SmallString is always up-to-date,
+/// may be used directly and there is no need to call flush().
 class raw_svector_ostream : public raw_pwrite_stream {
   SmallVectorImpl<char> &OS;
 
   /// See raw_ostream::write_impl.
-  void write_impl(const char *Ptr, size_t Size) override;
+  void write_impl(const char *Ptr, size_t Size) override{};
+  void pwrite_impl(const char *Ptr, size_t Size, uint64_t Offset) override {
+    memcpy(OS.data() + Offset, Ptr, Size);
+  }
 
-  void pwrite_impl(const char *Ptr, size_t Size, uint64_t Offset) override;
-
   /// Return the current position within the stream, not counting the bytes
   /// currently in the buffer.
-  uint64_t current_pos() const override;
+  uint64_t current_pos() const override { return OS.size(); }
 
-protected:
-  // Like the regular constructor, but doesn't call init.
-  explicit raw_svector_ostream(SmallVectorImpl<char> &O, unsigned);
-  void init();
-
 public:
   /// Construct a new raw_svector_ostream.
   ///
-  /// \param O The vector to write to; this should generally have at least 128
-  /// bytes free to avoid any extraneous memory overhead.
-  explicit raw_svector_ostream(SmallVectorImpl<char> &O);
-  ~raw_svector_ostream() override;
+  explicit raw_svector_ostream(SmallVectorImpl<char> &O) : OS(O) {}
+  ~raw_svector_ostream() override {}
 
+  raw_ostream &write(unsigned char C) override {
+    OS.push_back(C);
+    return *this;
+  }
+  raw_ostream &write(const char *Ptr, size_t Size) override {
+    OS.append(Ptr, Ptr + Size);
+    return *this;
+  }
 
-  /// This is called when the SmallVector we're appending to is changed outside
-  /// of the raw_svector_ostream's control.  It is only safe to do this if the
-  /// raw_svector_ostream has previously been flushed.
-  void resync();
+  // FIXME: remove resync() and users.
+  void resync() {}
 
-  /// Flushes the stream contents to the target vector and return a StringRef
-  /// for the vector contents.
-  StringRef str();
+  /// Return a StringRef for the vector contents.
+  StringRef str() { return StringRef(OS.data(), OS.size()); }
 };
 
 /// A raw_ostream that discards all output.
@@ -539,9 +541,7 @@
   SmallVector<char, 0> Buffer;
 
 public:
-  buffer_ostream(raw_ostream &OS) : raw_svector_ostream(Buffer, 0), OS(OS) {
-    init();
-  }
+  buffer_ostream(raw_ostream &OS) : raw_svector_ostream(Buffer), OS(OS) {}
   ~buffer_ostream() { OS << str(); }
 };
 
Index: lib/Support/raw_ostream.cpp
===================================================================
--- lib/Support/raw_ostream.cpp	(revision 237261)
+++ lib/Support/raw_ostream.cpp	(working copy)
@@ -752,78 +752,6 @@
 }
 
 //===----------------------------------------------------------------------===//
-//  raw_svector_ostream
-//===----------------------------------------------------------------------===//
-
-// The raw_svector_ostream implementation uses the SmallVector itself as the
-// buffer for the raw_ostream. We guarantee that the raw_ostream buffer is
-// always pointing past the end of the vector, but within the vector
-// capacity. This allows raw_ostream to write directly into the correct place,
-// and we only need to set the vector size when the data is flushed.
-
-raw_svector_ostream::raw_svector_ostream(SmallVectorImpl<char> &O, unsigned)
-    : OS(O) {}
-
-raw_svector_ostream::raw_svector_ostream(SmallVectorImpl<char> &O) : OS(O) {
-  init();
-}
-
-void raw_svector_ostream::init() {
-  // Set up the initial external buffer. We make sure that the buffer has at
-  // least 128 bytes free; raw_ostream itself only requires 64, but we want to
-  // make sure that we don't grow the buffer unnecessarily on destruction (when
-  // the data is flushed). See the FIXME below.
-  OS.reserve(OS.size() + 128);
-  SetBuffer(OS.end(), OS.capacity() - OS.size());
-}
-
-raw_svector_ostream::~raw_svector_ostream() {
-  // FIXME: Prevent resizing during this flush().
-  flush();
-}
-
-void raw_svector_ostream::pwrite_impl(const char *Ptr, size_t Size,
-                                      uint64_t Offset) {
-  flush();
-  memcpy(OS.begin() + Offset, Ptr, Size);
-}
-
-/// resync - This is called when the SmallVector we're appending to is changed
-/// outside of the raw_svector_ostream's control.  It is only safe to do this
-/// if the raw_svector_ostream has previously been flushed.
-void raw_svector_ostream::resync() {
-  assert(GetNumBytesInBuffer() == 0 && "Didn't flush before mutating vector");
-
-  if (OS.capacity() - OS.size() < 64)
-    OS.reserve(OS.capacity() * 2);
-  SetBuffer(OS.end(), OS.capacity() - OS.size());
-}
-
-void raw_svector_ostream::write_impl(const char *Ptr, size_t Size) {
-  if (Ptr == OS.end()) {
-    // Grow the buffer to include the scratch area without copying.
-    size_t NewSize = OS.size() + Size;
-    assert(NewSize <= OS.capacity() && "Invalid write_impl() call!");
-    OS.set_size(NewSize);
-  } else {
-    assert(!GetNumBytesInBuffer());
-    OS.append(Ptr, Ptr + Size);
-  }
-
-  OS.reserve(OS.size() + 64);
-  SetBuffer(OS.end(), OS.capacity() - OS.size());
-}
-
-uint64_t raw_svector_ostream::current_pos() const {
-   return OS.size();
-}
-
-StringRef raw_svector_ostream::str() {
-  flush();
-  return StringRef(OS.begin(), OS.size());
-}
-
-//===----------------------------------------------------------------------===//
 //  raw_null_ostream
 //===----------------------------------------------------------------------===//
 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: raw_svector_ostream_speed.cpp
Type: text/x-c++src
Size: 768 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150522/ee08391a/attachment.cpp>


More information about the llvm-dev mailing list