<div dir="rtl"><div dir="ltr"><span style="font-size:12.8000001907349px">I outlined (is that the right word?) raw_char_ostream::grow, raw_char_ostream::write (both) into raw_ostream.cpp with less than 10% difference in performance.</span></div><div dir="ltr"><span style="font-size:12.8000001907349px"><br></span></div><div dir="ltr"><span style="font-size:12.8000001907349px">Profiling reveals that the real culprit is the code line</span></div><div dir="ltr"><span style="font-size:12.8000001907349px"><br></span></div><div dir="ltr"><font face="monospace, monospace"><span style="font-size:12.8000001907349px"> OS.reserve(OS.size() + 64);</span><br></font></div><div dir="ltr"><span style="font-size:12.8000001907349px"><br></span></div><div dir="ltr"><span style="font-size:12.8000001907349px">called from raw_svector_ostream::write_impl called from </span><span style="font-size:12.8000001907349px">raw_svector_ostream::</span><span style="font-size:12.8000001907349px">~raw_svector_ostream.</span></div><div dir="ltr"><span style="font-size:12.8000001907349px">The issue is already documented:</span></div><div dir="ltr"><span style="font-size:12.8000001907349px"><br></span></div><div dir="ltr"><div dir="ltr" style><span style="font-size:12.8000001907349px"><font face="monospace, monospace">raw_svector_ostream::~raw_svector_ostream() {</font></span></div><div dir="ltr" style><span style="font-size:12.8000001907349px"><font face="monospace, monospace"> // FIXME: Prevent resizing during this flush().</font></span></div><div dir="ltr" style><span style="font-size:12.8000001907349px"><font face="monospace, monospace"> flush();</font></span></div><div dir="ltr" style><span style="font-size:12.8000001907349px"><font face="monospace, monospace">}</font></span></div><div dir="ltr" style="font-size:12.8000001907349px"><br></div><div style="font-size:12.8000001907349px">And a <span style="font-size:12.8000001907349px">solution is provided in raw_svector_ostream::init():</span></div><div style="font-size:12.8000001907349px"><span style="font-size:12.8000001907349px"><br></span></div><div style="font-size:12.8000001907349px"><span style="font-size:12.8000001907349px"><div style="font-size:12.8000001907349px"><font face="monospace, monospace"> // Set up the initial external buffer. We make sure that the buffer has at</font></div><div style="font-size:12.8000001907349px"><font face="monospace, monospace"> // least 128 bytes free; raw_ostream itself only requires 64, but we want to</font></div><div style="font-size:12.8000001907349px"><font face="monospace, monospace"> // make sure that we don't grow the buffer unnecessarily on destruction (when</font></div><div style="font-size:12.8000001907349px"><font face="monospace, monospace"> // the data is flushed). See the FIXME below.</font></div><div style="font-size:12.8000001907349px"><font face="monospace, monospace"> OS.reserve(OS.size() + 128);</font></div><div><br></div></span></div><div style="font-size:12.8000001907349px"><span style="font-size:12.8000001907349px">This solution may be worse than the problem. In total:</span></div><div style="font-size:12.8000001907349px"><span style="font-size:12.8000001907349px"><br></span></div><div style="font-size:12.8000001907349px"><span style="font-size:12.8000001907349px">* If the SmallString was less than 128 bytes </span><span style="font-size:12.8000001907349px">init() will always(!) heap allocate. </span></div><div style><span style="font-size:12.8000001907349px">* If it's more or equal to 128 bytes but upon destruction less than 64 bytes are left unused it will heap allocate for no reason. </span></div><div style="font-size:12.8000001907349px"><span style="font-size:12.8000001907349px"><br></span></div><div style="font-size:12.8000001907349px"><span style="font-size:12.8000001907349px">A careful programmer who did size the SmallString to match its use + small reserve will find either of these heap allocations surprising, negating the SmallString advantage and then paying memcpy cost.</span></div><div style="font-size:12.8000001907349px"><br></div><div style="font-size:12.8000001907349px">I filed <a href="http://llvm.org/pr23395">http://llvm.org/pr23395</a> about this. <span style="font-size:12.8000001907349px">To temporarly avoid this issue, i</span><span style="font-size:12.8000001907349px">n addition to the </span><span style="font-size:12.8000001907349px">outline methods change </span><span style="font-size:12.8000001907349px">I commented the </span><span style="font-size:12.8000001907349px">OS.reserve line in flush(). </span><span style="font-size:12.8000001907349px">(This change of course requires</span><span style="font-size:12.8000001907349px"> further review.)</span></div><div style="font-size:12.8000001907349px"><span style="font-size:12.8000001907349px"><br></span></div><div style="font-size:12.8000001907349px"><span style="font-size:12.8000001907349px">With reserve being out of the way, the gap now is smaller but still significant: </span><span style="font-size:12.8000001907349px">raw_char_ostream is about 20% faster than </span><span style="font-size:12.8000001907349px">raw_svector_ostream </span><span style="font-size:12.8000001907349px">in Release=optimized configuration.</span><br></div><div style="font-size:12.8000001907349px"><br></div></div></div><div class="gmail_extra"><div dir="ltr"><br><div class="gmail_quote">2015-05-02 4:39 GMT+03:00 Sean Silva <span dir="ltr"><<a href="mailto:chisophugis@gmail.com" target="_blank">chisophugis@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 .8ex;border-left:1px #ccc solid;border-right:1px #ccc solid;padding-left:1ex;padding-right:1ex"><br>Could you dig into why:<div><div>+ raw_ostream &write(unsigned char C) override {</div><div>+ grow(1);</div><div>+ *OutBufCur++ = C;</div><div>+ return *this;</div><div>+ }</div></div><div><br></div><div>Is 3 times as fast as raw_svector_ostream? I don't see a good reason why that should be any faster than:</div><div><br></div><div><div> raw_ostream &operator<<(char C) {</div><div> if (OutBufCur >= OutBufEnd)</div><div> return write(C);</div><div> *OutBufCur++ = C;</div><div> return *this;</div><div> }</div></div><div><br></div><div><br></div><div>You might just be seeing the difference between having write inlined vs. non-inlined, in which case your patch might be a complicated way to pull the likely cases of some raw_ostream methods into the header.</div><span class="HOEnZb"><font color="#888888"><div><br></div><div><br></div><div>-- Sean Silva</div></font></span></blockquote></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Apr 30, 2015 at 8:02 PM, Yaron Keren <span dir="ltr"><<a href="mailto:yaron.keren@gmail.com" target="_blank">yaron.keren@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="rtl"><div dir="ltr">Yes, we should do without virtual flush. The problem was raw_char_ostream should not be flushed - it's always current - so I overloaded flush to a no-op. A cleaner solution is attached, adding a DirectBuffer mode to the raw_ostream. </div><div dir="ltr"><br></div><div dir="ltr">Also added a simple performance comparison project between raw_char_ostream and raw_svector_ostream. On Window 7 x64 machine, raw_char_ostream was three times faster than raw_svector_ostream when the provided buffer size is large enough and two times as fast with one dynamic allocation resize.</div><div dir="ltr"><br></div></div><div><div><div class="gmail_extra"><br><div class="gmail_quote"><div dir="ltr">2015-04-30 22:48 GMT+03:00 Rafael Espíndola <span dir="ltr"><<a href="mailto:rafael.espindola@gmail.com" target="_blank">rafael.espindola@gmail.com</a>></span>:</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I don't think we should make flush virtual. Why do you need to do it?<br>
Can't you set up the base class to write to use the tail of the memory<br>
region as the buffer?<br>
<div><div><br>
<br>
<br>
On 24 April 2015 at 06:46, Yaron Keren <<a href="mailto:yaron.keren@gmail.com" target="_blank">yaron.keren@gmail.com</a>> wrote:<br>
> Hi,<br>
><br>
> Is this what you're thinking about?<br>
> The code is not tested yet, I'd like to know if the overall direction and<br>
> design are correct.<br>
><br>
> Yaron<br>
><br>
><br>
> 2015-04-21 0:10 GMT+03:00 Yaron Keren <<a href="mailto:yaron.keren@gmail.com" target="_blank">yaron.keren@gmail.com</a>>:<br>
>><br>
>> Sean, thanks for reminding this, Alp did commit a class derived from<br>
>> raw_svector_ostream conatining an internal SmallString he called<br>
>> small_string_ostream. The commit was reverted after a day due to a<br>
>> disagreement about the commit approval and apparently abandoned.<br>
>><br>
>><br>
>> <a href="http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223393.html" target="_blank">http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223393.html</a><br>
>><br>
>> <a href="http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223557.html" target="_blank">http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140623/223557.html</a><br>
>><br>
>> <a href="http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140630/223986.html" target="_blank">http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140630/223986.html</a><br>
>><br>
>> The class Alp comitted did solve the possible mismatch between the<br>
>> SmallString and the stream by making the SmallString private to the class.<br>
>> This however did not treat the root problem, the duplication of the<br>
>> information about the buffer between SmallString and the stream.<br>
>><br>
>> I can make performance measurements but even if the performance difference<br>
>> is neglible (and looking at all the code paths and conditionals of<br>
>> non-inlined functions at raw_ostream.cpp that's hard to believe), the<br>
>> current SmallString-raw_svector_ostream is simply wrong.<br>
>><br>
>> Personally, after the previous "Alp" discussion I found and fixed several<br>
>> such issues in my out-of-tree code only to make new similar new error<br>
>> earlier this year, which I caught only months after, when Rafael committed<br>
>> the pwrite which reminded me the issue. Due ot the information duplication<br>
>> Rafael commit also had a bug and Mehdi reports similar experience. Back then<br>
>> Alp reported similar problems he found in LLVM code which were hopefully<br>
>> fixed otherwise.<br>
>><br>
>> In addition to the information duplication, the design is quite confusing<br>
>> to use<br>
>><br>
>> - Should one use the stream.str() function or the SmallString itself?<br>
>> - flush or str?<br>
>> - How do you properly clear the combination for reuse (that was my<br>
>> mistake, I forgot to resync after OS.clear())?<br>
>><br>
>> It's safe to say this design pattern is very easy to get wrong one way or<br>
>> another, we got burned by it multiple times and it should be replaced.<br>
>><br>
>> Alp suggested a derived class containing its own SmallString. That's a<br>
>> simple and effective approach to avoid the bugs mentioned but does not fix<br>
>> the memory or runtime issues. Small as they may be, why have them at a<br>
>> fundemental data structure?<br>
>><br>
>> I was thinking about going further avoiding all duplication with a<br>
>> templated class derived with its own internal buffer storage.<br>
>><br>
>> Rafael suggested a more modular approach, a derived adpater class adapter<br>
>> to a *simple* buffer (or nullptr for fully-dynamic operation) which also<br>
>> won't duplicate any information the buffer is dumb and has no state.<br>
>><br>
>> That solution sounds simple, efficient and safe to use. The implementation<br>
>> would be actually simpler then raw_svector_ostream since all the<br>
>> coordination logic is not required anymore.<br>
>><br>
>><br>
>> 2015-04-20 22:17 GMT+03:00 Sean Silva <<a href="mailto:chisophugis@gmail.com" target="_blank">chisophugis@gmail.com</a>>:<br>
>>><br>
>>><br>
>>><br>
>>> On Sun, Apr 19, 2015 at 7:40 AM, Yaron Keren <<a href="mailto:yaron.keren@gmail.com" target="_blank">yaron.keren@gmail.com</a>><br>
>>> wrote:<br>
>>>><br>
>>>> A very common code pattern in LLVM is<br>
>>>><br>
>>>> SmallString<128> S;<br>
>>>> raw_svector_ostream OS(S);<br>
>>>> OS<< ...<br>
>>>> Use OS.str()<br>
>>>><br>
>>>> While raw_svector_ostream is smart to share the text buffer itself, it's<br>
>>>> inefficient keeping two sets of pointers to the same buffer:<br>
>>>><br>
>>>> In SmallString: void *BeginX, *EndX, *CapacityX<br>
>>>> In raw_ostream: char *OutBufStart, *OutBufEnd, *OutBufCur<br>
>>>><br>
>>>> Moreover, at runtime the two sets of pointers need to be coordinated<br>
>>>> between the SmallString and raw_svector_ostream using<br>
>>>> raw_svector_ostream::init, raw_svector_ostream::pwrite,<br>
>>>> raw_svector_ostream::resync and raw_svector_ostream::write_impl.<br>
>>>> All these functions have non-inlined implementations in raw_ostream.cpp.<br>
>>>><br>
>>>> Finally, this may cause subtle bugs if S is modified without calling<br>
>>>> OS::resync(). This is too easy to do by mistake.<br>
>>>><br>
>>>> In this frequent case usage the client does not really care about S<br>
>>>> being a SmallString with its many useful string helper function. It's just<br>
>>>> boilerplate code for raw_svector_ostream. But it does cost three extra<br>
>>>> pointers, some runtime performance and possible bugs.<br>
>>><br>
>>><br>
>>> I agree the bugs are real (Alp proposed something a while back regarding<br>
>>> this?), but you will need to provide measurements to justify the cost in<br>
>>> runtime performance. One technique I have used in the past to measure these<br>
>>> sorts of things I call "stuffing": take the operation that you want to<br>
>>> measure, then essentially change the logic so that you pay the cost 2 times,<br>
>>> 3 times, etc. You can then look at the trend in performance as N varies and<br>
>>> extrapolate back to the case where N = 0 (i.e. you don't pay the cost).<br>
>>><br>
>>> For example, in one situation where I used this method it was to measure<br>
>>> the cost of stat'ing files (sys::fs::status) across a holistic build, using<br>
>>> only "time" on the command line (it was on Windows and I didn't have any<br>
>>> tools like DTrace available that can directly measure this). In order to do<br>
>>> this, I changed sys::fs::status to call stat N times instead of 1, and<br>
>>> measured with N=1 N=2 N=3 etc. The result was that the difference between<br>
>>> the N and N+1 versions was about 1-2% across N=1..10 (or whatever I<br>
>>> measured). In order to negate caching and other confounding effects, it is<br>
>>> important to try different distributions of stats; e.g. the extra stats are<br>
>>> on the same file as the "real" stat vs. the extra stats are on nonexistent<br>
>>> files in the same directory as the "real" file vs. parent directories of the<br>
>>> "real" file; if these match up fairly well (they did), then you have some<br>
>>> indication that the "stuffing" is measuring what you want to measure.<br>
>>><br>
>>> So e.g. if you think the cost of 3 extra pointers is significant, then<br>
>>> "stuff" the struct with 3, 6, 9, ... extra pointers and measure the<br>
>>> difference in performance (e.g. measure the time building a real project).<br>
>>><br>
>>> -- Sean Silva<br>
>>><br>
>>>><br>
>>>><br>
>>>> To solve all three issues, would it make sense to have<br>
>>>> raw_ostream-derived container with a its own SmallString like templated-size<br>
>>>> built-in buffer?<br>
>>>><br>
>>><br>
><br>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></div></div>