[PATCH] Disable buffering for raw_null_ostream()

Mehdi Amini mehdi.amini at apple.com
Thu Jul 2 09:55:02 PDT 2015


> On Jul 2, 2015, at 6:23 AM, Rafael Espíndola <rafael.espindola at gmail.com> wrote:
> 
> On 1 July 2015 at 01:40, Mehdi AMINI <mehdi.amini at apple.com> wrote:
>> Hi rafael,
>> 
>> There is no need to buffer the nulls() output.
> 
> There is performance. The idea of nulls() is that it is a very fast
> way to discard data. Making it unbuffered adds a virtual call to every
> write.

Do you expect memcpy to be faster than a virtual call to an empty function?
>From what I can see with a quick measurement on OS X compiled with Release and *no* assertions it does not seem to be the case.

The artificial test case below should be in favor of the buffered case I think: the buffer should stay in cache during the whole time and the string is small enough that the buffer will have to be flushed every thousand times or so.

  my_null_stream OS;
  std::string S = "Hello World\n";
  for (int i = 0; i < 1000000000; i++) {
    OS << S;
  }


Buffered:

real	0m6.494s
user	0m6.481s
sys	0m0.008s

Unbuffered:

real	0m5.332s
user	0m5.293s
sys	0m0.010s

Note that the implementation of `raw_ostream::write(const char *Ptr, size_t Size)` is highly optimized for the Buffered case with multiple tests (marked unlikely!) before reaching the Unbuffered case.
The gap increases if I add at the beginning of this function :

      if (BufferMode == Unbuffered) {
        write_impl(Ptr, Size);
        return *this;
      }

Buffered:

real	0m6.582s
user	0m6.541s
sys	0m0.010s

Unbuffered:

real	0m4.685s
user	0m4.646s
sys	0m0.010s


Now there is another implementation “trick”, the << operator has a “fast-path” for StringRef, if I replace in the test case:

    OS << S;

with

    OS << StringRef(S);

Then we obtain:

Buffered:

real	0m4.282s
user	0m4.244s
sys	0m0.011s

Unbuffered:

real	0m5.334s
user	0m5.293s
sys	0m0.012s


Again if I use the same trick as before by adding code to handle the unbuffered case to the fast-path by doing:

  raw_ostream &operator<<(StringRef Str) {
    // Inline fast path, particularly for strings with a known length.
    size_t Size = Str.size();
    if (BufferMode == Unbuffered) {
      write_impl(Str.data(), Size);
      return *this;
    }
   ….

The timing becomes:

Buffered:

real	0m4.646s
user	0m4.634s
sys	0m0.008s

Unbuffered:

real	0m3.085s
user	0m3.048s
sys	0m0.009s

Keeping in mind that the test should be in favor of the buffered case, it is not clear to me that a more balanced test wouldn’t be worse for the buffered case.

So the bulk of the cost seems to be that the complex logic in raw_stream designed to favor the buffered case more than the virtual function call.
Note also that this should applies to our other discussion about the formatted_stream.




> 
>> Moreover it kept a shared buffer, and made using nulls() not possible
>> in a multi-threaded environment.
> 
> Sorry, I really don't see it. By reading the code it looks like we
> will call SetBuffered,  which will call  SetBufferSize(Size), which
> will allocate a new buffer.

Sure, the problem is not with the class raw_null_ostream itself, it is really with the function “nulls()” which returns a static object.
By having it unbuffered, it become immutable and can be shared across threads.

— 
Mehdi





More information about the llvm-commits mailing list