<div dir="rtl"><div dir="ltr"><div style="font-size:12.8000001907349px">+<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a></div><div style="font-size:12.8000001907349px"><br></div><div dir="ltr" style="font-size:12.8000001907349px">The impact should be small as all the other streamers usually write directly to the memory buffer and only when out of buffer they call write().</div><div dir="ltr" style="font-size:12.8000001907349px"><br></div><div dir="ltr" style="font-size:12.8000001907349px">OTOH, raw_svector_ostream (without a buffer) goes though write for every character or block it writes. It can work without virtual write() by overriding the existing virtual write_impl() but this is a slower code path for raw_svector_ostream. </div><div dir="ltr" style="font-size:12.8000001907349px"><br></div><div dir="ltr" style="font-size:12.8000001907349px">Even though the attached raw_svector_ostream patch without virtual write() is still significantly faster then the current raw_svector_ostream since it avoids the double buffering. With this patch, on my system the example runs at 750ms compared with about 1000ms for the current raw_svector_ostream.</div><div dir="ltr" style="font-size:12.8000001907349px"><br></div><div dir="ltr" style="font-size:12.8000001907349px">Would you prefer to continue review on Phabricator?</div><div class="yj6qo ajU" style="font-size:12.8000001907349px"></div></div></div><div class="gmail_extra"><br><div class="gmail_quote"><div dir="ltr">2015-07-11 20:48 GMT+03:00 Rafael Espíndola <span dir="ltr"><<a href="mailto:rafael.espindola@gmail.com" target="_blank">rafael.espindola@gmail.com</a>></span>:</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">This makes write virtual. What is the impact of that on the other streamers?<br>

<div class="HOEnZb"><div class="h5"><br>

<br>

<br>

On 22 May 2015 at 10:17, Yaron Keren <<a href="mailto:yaron.keren@gmail.com">yaron.keren@gmail.com</a>> wrote:<br>

> Here's a performance testcase for the raw_svector_ostream patch. On my<br>

> WIndows x64 machine it runs in 1010ms with the current code and in 440ms<br>

> with the patch applies. Is this OK to commit?<br>

><br>

><br>

> 2015-05-02 21:31 GMT+03:00 Yaron Keren <<a href="mailto:yaron.keren@gmail.com">yaron.keren@gmail.com</a>>:<br>

>><br>

>> Following a hint from Duncan in <a href="http://llvm.org/pr23395" rel="noreferrer" target="_blank">http://llvm.org/pr23395</a>, here is a revised<br>

>> patch. Rather then introduce the raw_char_ostream adapter, this version<br>

>> improves raw_svector_ostream.<br>

>><br>

>> raw_svector_ostream is now a very lightweight adapter, running without a<br>

>> buffer and fowarding almost anything to the SmallString. This solves both<br>

>> the performance issue and the information duplication issue. The SmallString<br>

>> is always up-to-date and can be used at any time. flush() does nothing and<br>

>> is not required. resync() was kept in this patch as a no-op and will be<br>

>> removed with its users in a follow-up patch. I'll also try to remove the<br>

>> superfluous flush calls() along the way.<br>

>><br>

>><br>

>><br>

>> 2015-05-02 7:41 GMT+03:00 Yaron Keren <<a href="mailto:yaron.keren@gmail.com">yaron.keren@gmail.com</a>>:<br>

>>><br>

>>> +update diff<br>

>>><br>

>>><br>

>>> 2015-05-02 7:38 GMT+03:00 Yaron Keren <<a href="mailto:yaron.keren@gmail.com">yaron.keren@gmail.com</a>>:<br>

>>>><br>

>>>> I outlined (is that the right word?) raw_char_ostream::grow,<br>

>>>> raw_char_ostream::write (both) into raw_ostream.cpp with less than 10%<br>

>>>> difference in performance.<br>

>>>><br>

>>>> Profiling reveals that the real culprit is the code line<br>

>>>><br>

>>>>  OS.reserve(OS.size() + 64);<br>

>>>><br>

>>>> called from raw_svector_ostream::write_impl called from<br>

>>>> raw_svector_ostream::~raw_svector_ostream.<br>

>>>> The issue is already documented:<br>

>>>><br>

>>>> raw_svector_ostream::~raw_svector_ostream() {<br>

>>>>   // FIXME: Prevent resizing during this flush().<br>

>>>>   flush();<br>

>>>> }<br>

>>>><br>

>>>> And a solution is provided in raw_svector_ostream::init():<br>

>>>><br>

>>>>   // Set up the initial external buffer. We make sure that the buffer<br>

>>>> has at<br>

>>>>   // least 128 bytes free; raw_ostream itself only requires 64, but we<br>

>>>> want to<br>

>>>>   // make sure that we don't grow the buffer unnecessarily on<br>

>>>> destruction (when<br>

>>>>   // the data is flushed). See the FIXME below.<br>

>>>>   OS.reserve(OS.size() + 128);<br>

>>>><br>

>>>> This solution may be worse than the problem. In total:<br>

>>>><br>

>>>> * If the SmallString was less than 128 bytes init() will always(!) heap<br>

>>>> allocate.<br>

>>>> * If it's more or equal to 128 bytes but upon destruction less than 64<br>

>>>> bytes are left unused it will heap allocate for no reason.<br>

>>>><br>

>>>> A careful programmer who did size the SmallString to match its use +<br>

>>>> small reserve will find either of these heap allocations surprising,<br>

>>>> negating the SmallString advantage and then paying memcpy cost.<br>

>>>><br>

>>>> I filed <a href="http://llvm.org/pr23395" rel="noreferrer" target="_blank">http://llvm.org/pr23395</a> about this. To temporarly avoid this<br>

>>>> issue, in addition to the outline methods change I commented the OS.reserve<br>

>>>> line in flush(). (This change of course requires further review.)<br>

>>>><br>

>>>> With reserve being out of the way, the gap now is smaller but still<br>

>>>> significant: raw_char_ostream is about 20% faster than raw_svector_ostream<br>

>>>> in Release=optimized configuration.<br>

>>>><br>

>>>><br>

>>>> 2015-05-02 4:39 GMT+03:00 Sean Silva <<a href="mailto:chisophugis@gmail.com">chisophugis@gmail.com</a>>:<br>

>>>>><br>

>>>>><br>

>>>>> Could you dig into why:<br>

>>>>> +  raw_ostream &write(unsigned char C) override {<br>

>>>>> +    grow(1);<br>

>>>>> +    *OutBufCur++ = C;<br>

>>>>> +    return *this;<br>

>>>>> +  }<br>

>>>>><br>

>>>>> Is 3 times as fast as raw_svector_ostream? I don't see a good reason<br>

>>>>> why that should be any faster than:<br>

>>>>><br>

>>>>>   raw_ostream &operator<<(char C) {<br>

>>>>>     if (OutBufCur >= OutBufEnd)<br>

>>>>>       return write(C);<br>

>>>>>     *OutBufCur++ = C;<br>

>>>>>     return *this;<br>

>>>>>   }<br>

>>>>><br>

>>>>><br>

>>>>> You might just be seeing the difference between having write inlined<br>

>>>>> vs. non-inlined, in which case your patch might be a complicated way to pull<br>

>>>>> the likely cases of some raw_ostream methods into the header.<br>

>>>>><br>

>>>>><br>

>>>>> -- Sean Silva<br>

>>>><br>

>>>><br>

>>>> On Thu, Apr 30, 2015 at 8:02 PM, Yaron Keren <<a href="mailto:yaron.keren@gmail.com">yaron.keren@gmail.com</a>><br>

>>>> wrote:<br>

>>>>><br>

>>>>> Yes, we should do without virtual flush. The problem was<br>

>>>>> raw_char_ostream should not be flushed - it's always current - so I<br>

>>>>> overloaded flush to a no-op. A cleaner solution is attached, adding a<br>

>>>>> DirectBuffer mode to the raw_ostream.<br>

>>>>><br>

>>>>> Also added a simple performance comparison project between<br>

>>>>> raw_char_ostream and raw_svector_ostream. On Window 7 x64 machine,<br>

>>>>> raw_char_ostream was three times faster than raw_svector_ostream when the<br>

>>>>> provided buffer size is large enough and two times as fast with one dynamic<br>

>>>>> allocation resize.<br>

>>>>><br>

>>>>><br>

>>>>> 2015-04-30 22:48 GMT+03:00 Rafael Espíndola<br>

>>>>> <<a href="mailto:rafael.espindola@gmail.com">rafael.espindola@gmail.com</a>>:<br>

>>>>>><br>

>>>>>> I don't think we should make flush virtual. Why do you need to do it?<br>

>>>>>> Can't you set up the base class to write to use the tail of the memory<br>

>>>>>> region as the buffer?<br>

>>>>>><br>

>>>>>><br>

>>>>>><br>

>>>>>> On 24 April 2015 at 06:46, Yaron Keren <<a href="mailto:yaron.keren@gmail.com">yaron.keren@gmail.com</a>> wrote:<br>

>>>>>> > Hi,<br>

>>>>>> ><br>

>>>>>> > Is this what you're thinking about?<br>

>>>>>> > The code is not tested yet, I'd like to know if the overall<br>

>>>>>> > direction and<br>

>>>>>> > design are correct.<br>

>>>>>> ><br>

>>>>>> > Yaron<br>

>>>>>> ><br>

>>>>>> ><br>

>>>>>> > 2015-04-21 0:10 GMT+03:00 Yaron Keren <<a href="mailto:yaron.keren@gmail.com">yaron.keren@gmail.com</a>>:<br>

>>>>>> >><br>

>>>>>> >> Sean, thanks for reminding this, Alp did commit a class derived<br>

>>>>>> >> from<br>

>>>>>> >> raw_svector_ostream conatining an internal SmallString he called<br>

>>>>>> >> small_string_ostream. The commit was reverted after a day due to a<br>

>>>>>> >> disagreement about the commit approval and apparently abandoned.<br>

>>>>>> >><br>

>>>>>> >><br>

>>>>>> >><br>

>>>>>> >> <a href="http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223393.html" rel="noreferrer" target="_blank">http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223393.html</a><br>

>>>>>> >><br>

>>>>>> >><br>

>>>>>> >> <a href="http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223557.html" rel="noreferrer" target="_blank">http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223557.html</a><br>

>>>>>> >><br>

>>>>>> >><br>

>>>>>> >> <a href="http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140630/223986.html" rel="noreferrer" target="_blank">http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140630/223986.html</a><br>

>>>>>> >><br>

>>>>>> >> The class Alp comitted did solve the possible mismatch between the<br>

>>>>>> >> SmallString and the stream by making the SmallString private to the<br>

>>>>>> >> class.<br>

>>>>>> >> This however did not treat the root problem, the duplication of the<br>

>>>>>> >> information about the buffer between SmallString and the stream.<br>

>>>>>> >><br>

>>>>>> >> I can make performance measurements but even if the performance<br>

>>>>>> >> difference<br>

>>>>>> >> is neglible (and looking at all the code paths and conditionals of<br>

>>>>>> >> non-inlined functions at raw_ostream.cpp that's hard to believe),<br>

>>>>>> >> the<br>

>>>>>> >> current SmallString-raw_svector_ostream is simply wrong.<br>

>>>>>> >><br>

>>>>>> >> Personally, after the previous "Alp" discussion I found and fixed<br>

>>>>>> >> several<br>

>>>>>> >> such issues in my out-of-tree code only to make new similar new<br>

>>>>>> >> error<br>

>>>>>> >> earlier this year, which I caught only months after, when Rafael<br>

>>>>>> >> committed<br>

>>>>>> >> the pwrite which reminded me the issue. Due ot the information<br>

>>>>>> >> duplication<br>

>>>>>> >> Rafael commit also had a bug and Mehdi reports similar experience.<br>

>>>>>> >> Back then<br>

>>>>>> >> Alp reported similar problems he found in LLVM code which were<br>

>>>>>> >> hopefully<br>

>>>>>> >> fixed otherwise.<br>

>>>>>> >><br>

>>>>>> >> In addition to the information duplication, the design is quite<br>

>>>>>> >> confusing<br>

>>>>>> >> to use<br>

>>>>>> >><br>

>>>>>> >> - Should one use the stream.str() function or the SmallString<br>

>>>>>> >> itself?<br>

>>>>>> >> - flush or str?<br>

>>>>>> >> - How do you properly clear the combination for reuse (that was my<br>

>>>>>> >> mistake, I forgot to resync after OS.clear())?<br>

>>>>>> >><br>

>>>>>> >> It's safe to say this design pattern is very easy to get wrong one<br>

>>>>>> >> way or<br>

>>>>>> >> another, we got burned by it multiple times and it should be<br>

>>>>>> >> replaced.<br>

>>>>>> >><br>

>>>>>> >> Alp suggested a derived class containing its own SmallString.<br>

>>>>>> >> That's a<br>

>>>>>> >> simple and effective approach to avoid the bugs mentioned but does<br>

>>>>>> >> not fix<br>

>>>>>> >> the memory or runtime issues. Small as they may be, why have them<br>

>>>>>> >> at a<br>

>>>>>> >> fundemental data structure?<br>

>>>>>> >><br>

>>>>>> >> I was thinking about going further avoiding all duplication with a<br>

>>>>>> >> templated class derived with its own internal buffer storage.<br>

>>>>>> >><br>

>>>>>> >> Rafael suggested a more modular approach, a derived adpater class<br>

>>>>>> >> adapter<br>

>>>>>> >> to a *simple* buffer (or nullptr for fully-dynamic operation) which<br>

>>>>>> >> also<br>

>>>>>> >> won't duplicate any information the buffer is dumb and has no<br>

>>>>>> >> state.<br>

>>>>>> >><br>

>>>>>> >> That solution sounds simple, efficient and safe to use. The<br>

>>>>>> >> implementation<br>

>>>>>> >> would be actually simpler then raw_svector_ostream since all the<br>

>>>>>> >> coordination logic is not required anymore.<br>

>>>>>> >><br>

>>>>>> >><br>

>>>>>> >> 2015-04-20 22:17 GMT+03:00 Sean Silva <<a href="mailto:chisophugis@gmail.com">chisophugis@gmail.com</a>>:<br>

>>>>>> >>><br>

>>>>>> >>><br>

>>>>>> >>><br>

>>>>>> >>> On Sun, Apr 19, 2015 at 7:40 AM, Yaron Keren<br>

>>>>>> >>> <<a href="mailto:yaron.keren@gmail.com">yaron.keren@gmail.com</a>><br>

>>>>>> >>> wrote:<br>

>>>>>> >>>><br>

>>>>>> >>>> A very common code pattern in LLVM is<br>

>>>>>> >>>><br>

>>>>>> >>>>  SmallString<128> S;<br>

>>>>>> >>>>  raw_svector_ostream OS(S);<br>

>>>>>> >>>>  OS<< ...<br>

>>>>>> >>>>  Use OS.str()<br>

>>>>>> >>>><br>

>>>>>> >>>> While raw_svector_ostream is smart to share the text buffer<br>

>>>>>> >>>> itself, it's<br>

>>>>>> >>>> inefficient keeping two sets of pointers to the same buffer:<br>

>>>>>> >>>><br>

>>>>>> >>>>  In SmallString: void *BeginX, *EndX, *CapacityX<br>

>>>>>> >>>>  In raw_ostream: char *OutBufStart, *OutBufEnd, *OutBufCur<br>

>>>>>> >>>><br>

>>>>>> >>>> Moreover, at runtime the two sets of pointers need to be<br>

>>>>>> >>>> coordinated<br>

>>>>>> >>>> between the SmallString and raw_svector_ostream using<br>

>>>>>> >>>> raw_svector_ostream::init, raw_svector_ostream::pwrite,<br>

>>>>>> >>>> raw_svector_ostream::resync and raw_svector_ostream::write_impl.<br>

>>>>>> >>>> All these functions have non-inlined implementations in<br>

>>>>>> >>>> raw_ostream.cpp.<br>

>>>>>> >>>><br>

>>>>>> >>>> Finally, this may cause subtle bugs if S is modified without<br>

>>>>>> >>>> calling<br>

>>>>>> >>>> OS::resync(). This is too easy to do by mistake.<br>

>>>>>> >>>><br>

>>>>>> >>>> In this frequent case usage the client does not really care about<br>

>>>>>> >>>> S<br>

>>>>>> >>>> being a SmallString with its many useful string helper function.<br>

>>>>>> >>>> It's just<br>

>>>>>> >>>> boilerplate code for raw_svector_ostream. But it does cost three<br>

>>>>>> >>>> extra<br>

>>>>>> >>>> pointers, some runtime performance and possible bugs.<br>

>>>>>> >>><br>

>>>>>> >>><br>

>>>>>> >>> I agree the bugs are real (Alp proposed something a while back<br>

>>>>>> >>> regarding<br>

>>>>>> >>> this?), but you will need to provide measurements to justify the<br>

>>>>>> >>> cost in<br>

>>>>>> >>> runtime performance. One technique I have used in the past to<br>

>>>>>> >>> measure these<br>

>>>>>> >>> sorts of things I call "stuffing": take the operation that you<br>

>>>>>> >>> want to<br>

>>>>>> >>> measure, then essentially change the logic so that you pay the<br>

>>>>>> >>> cost 2 times,<br>

>>>>>> >>> 3 times, etc. You can then look at the trend in performance as N<br>

>>>>>> >>> varies and<br>

>>>>>> >>> extrapolate back to the case where N = 0 (i.e. you don't pay the<br>

>>>>>> >>> cost).<br>

>>>>>> >>><br>

>>>>>> >>> For example, in one situation where I used this method it was to<br>

>>>>>> >>> measure<br>

>>>>>> >>> the cost of stat'ing files (sys::fs::status) across a holistic<br>

>>>>>> >>> build, using<br>

>>>>>> >>> only "time" on the command line (it was on Windows and I didn't<br>

>>>>>> >>> have any<br>

>>>>>> >>> tools like DTrace available that can directly measure this). In<br>

>>>>>> >>> order to do<br>

>>>>>> >>> this, I changed sys::fs::status to call stat N times instead of 1,<br>

>>>>>> >>> and<br>

>>>>>> >>> measured with N=1 N=2 N=3 etc. The result was that the difference<br>

>>>>>> >>> between<br>

>>>>>> >>> the N and N+1 versions was about 1-2% across N=1..10 (or whatever<br>

>>>>>> >>> I<br>

>>>>>> >>> measured). In order to negate caching and other confounding<br>

>>>>>> >>> effects, it is<br>

>>>>>> >>> important to try different distributions of stats; e.g. the extra<br>

>>>>>> >>> stats are<br>

>>>>>> >>> on the same file as the "real" stat vs. the extra stats are on<br>

>>>>>> >>> nonexistent<br>

>>>>>> >>> files in the same directory as the "real" file vs. parent<br>

>>>>>> >>> directories of the<br>

>>>>>> >>> "real" file; if these match up fairly well (they did), then you<br>

>>>>>> >>> have some<br>

>>>>>> >>> indication that the "stuffing" is measuring what you want to<br>

>>>>>> >>> measure.<br>

>>>>>> >>><br>

>>>>>> >>> So e.g. if you think the cost of 3 extra pointers is significant,<br>

>>>>>> >>> then<br>

>>>>>> >>> "stuff" the struct with 3, 6, 9, ... extra pointers and measure<br>

>>>>>> >>> the<br>

>>>>>> >>> difference in performance (e.g. measure the time building a real<br>

>>>>>> >>> project).<br>

>>>>>> >>><br>

>>>>>> >>> -- Sean Silva<br>

>>>>>> >>><br>

>>>>>> >>>><br>

>>>>>> >>>><br>

>>>>>> >>>> To solve all three issues, would it make sense to have<br>

>>>>>> >>>> raw_ostream-derived container with a its own SmallString like<br>

>>>>>> >>>> templated-size<br>

>>>>>> >>>> built-in buffer?<br>

>>>>>> >>>><br>

>>>>>> >>><br>

>>>>>> ><br>

>>>>><br>

>>>>><br>

>>>><br>

><br>

</div></div></blockquote></div><br></div>