[PATCH] Disable buffering for raw_null_ostream()
Mehdi Amini
mehdi.amini at apple.com
Thu Jul 2 09:55:02 PDT 2015
> On Jul 2, 2015, at 6:23 AM, Rafael Espíndola <rafael.espindola at gmail.com> wrote:
>
> On 1 July 2015 at 01:40, Mehdi AMINI <mehdi.amini at apple.com> wrote:
>> Hi rafael,
>>
>> There is no need to buffer the nulls() output.
>
> There is performance. The idea of nulls() is that it is a very fast
> way to discard data. Making it unbuffered adds a virtual call to every
> write.
Do you expect memcpy to be faster than a virtual call to an empty function?
>From what I can see with a quick measurement on OS X compiled with Release and *no* assertions it does not seem to be the case.
The artificial test case below should be in favor of the buffered case I think: the buffer should stay in cache during the whole time and the string is small enough that the buffer will have to be flushed every thousand times or so.
my_null_stream OS;
std::string S = "Hello World\n";
for (int i = 0; i < 1000000000; i++) {
OS << S;
}
Buffered:
real 0m6.494s
user 0m6.481s
sys 0m0.008s
Unbuffered:
real 0m5.332s
user 0m5.293s
sys 0m0.010s
Note that the implementation of `raw_ostream::write(const char *Ptr, size_t Size)` is highly optimized for the Buffered case with multiple tests (marked unlikely!) before reaching the Unbuffered case.
The gap increases if I add at the beginning of this function :
if (BufferMode == Unbuffered) {
write_impl(Ptr, Size);
return *this;
}
Buffered:
real 0m6.582s
user 0m6.541s
sys 0m0.010s
Unbuffered:
real 0m4.685s
user 0m4.646s
sys 0m0.010s
Now there is another implementation “trick”, the << operator has a “fast-path” for StringRef, if I replace in the test case:
OS << S;
with
OS << StringRef(S);
Then we obtain:
Buffered:
real 0m4.282s
user 0m4.244s
sys 0m0.011s
Unbuffered:
real 0m5.334s
user 0m5.293s
sys 0m0.012s
Again if I use the same trick as before by adding code to handle the unbuffered case to the fast-path by doing:
raw_ostream &operator<<(StringRef Str) {
// Inline fast path, particularly for strings with a known length.
size_t Size = Str.size();
if (BufferMode == Unbuffered) {
write_impl(Str.data(), Size);
return *this;
}
….
The timing becomes:
Buffered:
real 0m4.646s
user 0m4.634s
sys 0m0.008s
Unbuffered:
real 0m3.085s
user 0m3.048s
sys 0m0.009s
Keeping in mind that the test should be in favor of the buffered case, it is not clear to me that a more balanced test wouldn’t be worse for the buffered case.
So the bulk of the cost seems to be that the complex logic in raw_stream designed to favor the buffered case more than the virtual function call.
Note also that this should applies to our other discussion about the formatted_stream.
>
>> Moreover it kept a shared buffer, and made using nulls() not possible
>> in a multi-threaded environment.
>
> Sorry, I really don't see it. By reading the code it looks like we
> will call SetBuffered, which will call SetBufferSize(Size), which
> will allocate a new buffer.
Sure, the problem is not with the class raw_null_ostream itself, it is really with the function “nulls()” which returns a static object.
By having it unbuffered, it become immutable and can be shared across threads.
—
Mehdi
More information about the llvm-commits
mailing list