Introduce StringBuilder utility around raw_string_ostream

David Blaikie dblaikie at gmail.com
Thu Jun 26 16:55:24 PDT 2014


On Thu, Jun 26, 2014 at 4:50 PM, David Blaikie <dblaikie at gmail.com> wrote:
> On Thu, Jun 26, 2014 at 4:05 PM, Alp Toker <alp at nuanti.com> wrote:
>>
>> On 27/06/2014 01:42, David Blaikie wrote:
>>>
>>> On Thu, Jun 26, 2014 at 3:37 PM, Alp Toker <alp at nuanti.com> wrote:
>>>>
>>>> On 27/06/2014 00:48, David Blaikie wrote:
>>>>>
>>>>> On Wed, Jun 25, 2014 at 9:53 PM, David Blaikie <dblaikie at gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> On Wed, Jun 25, 2014 at 5:52 PM, Alp Toker <alp at nuanti.com> wrote:
>>>>>>>
>>>>>>> On 26/06/2014 03:16, Chandler Carruth wrote:
>>>>>>>
>>>>>>>> On Thu, Jun 26, 2014 at 2:09 AM, Alp Toker <alp at nuanti.com
>>>>>>>> <mailto:alp at nuanti.com>> wrote:
>>>>>>>>
>>>>>>>>       I've made that change, plus a couple more updated uses, and
>>>>>>>> landed
>>>>>>>>       this in r211749.
>>>>>>
>>>>>> FWIW, perhaps I wasn't clear - but my "I'd like to loop back around
>>>>>> with Chandler to nail down the design choice" was "I'm not signing off
>>>>>> on this patch without Chandler", so the commit may've been a bit
>>>>>> premature.
>>>>>
>>>>> Given this confusion, I think appropriate to back this patch out until
>>>>> it's actually been signed off on.
>>>>>
>>>>> Sorry for my vague language/lack of clarity over incremental review
>>>>> compared to sign-off.
>>>>
>>>>
>>>> I don't agree there was any problem with review here. The patch got its
>>>> OKs
>>>> and looks greats,
>>>
>>> You asked me if it was OK to land. I gave some feedback and said I'd
>>> like to wait. You interpreted that as sign-off. I did not intend it as
>>> such. We seem to have failed to communicate and I'm trying to clarify
>>> what I intended to communicate.
>>>
>>> I'm sorry for the miscommunication, but I really didn't mean to give
>>> you the impression that this was ready to commit. Please revert it.
>>
>>
>> There's obviously no way to revert this without losing the bug fixes.
>
> Right - they can all be included together in whatever form this is
> recommitted. No worries about that. (happens all the time here - given
> we prefer fine grained commits (ideally so that sometimes they can be
> reverted separately, or at least examined separately when looking
> through commit history, etc) so sometimes a set of related patches
> have to be reverted together. Not a problem)
>
>> I've backed them out together temporarily in r211814.
>
> Thanks!
>
>> Could you state any concerns you have clearly?
>
> Sure - sorry, just didn't want to confuse code review protocol with
> the specific merits or caveats of this patch. Not even a question of
> whether I had concerns or not, just that I hadn't actually signed off
> on the patch. A simple misunderstanding. I'll try to be more clear in
> the future.
>
>> A critical feature of the patch is guaranteed safety. Where you see a
>> "string_ostream" type you get an assurance that the string building
>> operations are valid.
>
> The difference here would just be "Where you see string_ostream
> default constructed, you get an assurance that the string building
> operations are valid", right? (it's just "if you use this constructor
> that takes an externally-provided buffer, buyer beware" - tihs is true
> of many types (eg: std::unique_ptr's raw ctor, compared to something
> like make_unique)).
>
>> Those assurances apply even when the builder is passed by reference. You can
>> see the clarity this adds for example in LLVMCreateMessage() and a few other
>> places on ToT.
>
> The cleanup in LLVMCreateMessage and callers seems almost entirely
> orthogonal to your new type. The same API change could be made even
> with the old code:
>
>   string s;
>   raw_string_ostream stream(s);
>   ...
>   return LLVMCreateMessage(stream);
>
> LLVMCreateMessage is still safe, clear, and correct using either API
> because str(), even on raw_string_ostream as it is today, is still
> well defined to flush and then give access to the string.
>
> Or, with a single type, it'd just be:
>
>   raw_string_ostream stream;
>
> instead of
>
>   string_ostream stream;
>
> Or is there some other clarity benefit you were referring to?
>
> (with the same LLVMCreateMessage change you made (though I personally
> would have LLVMCreateMessage take a const std::string& - it seems more
> appropriate for a variety of callers, even if the current callers all
> happen to have raw_string_ostreams at the moment - indeed, perhaps
> some of them should just be using string concatenation anyway))
>
>> Importantly, there is no locality of construction so *the type becomes the
>> documentation*.
>
> Generally it's only important to know this distinction in the scope
> where the type is declared (because it's that scope where the external
> storage is provided) - once the stream is passed to some other API,
> the only access to the string is likely going to be through
> raw_string_ostream's "str()" member, which does the right thing.
>
>> If we were to lump storage into raw_string_ostream, this strong safety
>> indicated by the type system would disappear along with all the heap
>> savings.
>
> Heap savings can be reacquired where /really/ necessary by using
> external storage, if necessary. By building in storage, that would
> become the exception rather than the rule, and something we could more
> closely scrutinize during code review.

Realized of course that since raw_string_ostream::str returns a
std::string& that external storage is incompatible with that. If we
want to support that, one option would be to turn raw_string_ostream
into a class template with a default argument of std::string.

In that way, raw_string_ostream's str() could return a reference to
whatever its underlying storage is. It would mean rewriting every
reference to raw_string_ostream to raw_string_ostream<>.

Again, not sure how important it is to have that option, honestly
(given that performance-wise, we've been fine using std::string as the
only backing storage for raw_string_ostream so far - unless I see
relevant perf numbers, I'd be hesitant to use this as a reason to
proliferate more types, etc)

>
>> (which simply aren't negligible compared to some of the effort going into
>> bit packing elsewhere in the codebase).
>
> Apples and oranges, I suspect (& wouldn't mind some evidence to the
> contrary before we proliferate more choices in data structure type on
> the codebase - we have a lot, many for good reason, but there is a
> cognitive/code review/etc load we pay by adding more). The bit packing
> sort of stuff happens in data structures that there are often
> thousands upon thousands of in a single run, or at a single point in
> time so they have a significant impact on peak memory usage. I doubt
> we have more than a handful of raw_string_ostreams in existence
> simultaneously.
>
>> If we mix semantics in the base, we'd basically just end up a raw string
>> ostream that sometimes isn't raw,  even making the name of the type a
>> little dubious.
>
> (On naming: as Chandler mentioned, these are all raw streams - even
> the new ones you're proposing, the "raw"ness is intended to
> distinguish them from the C++ Standard stream libraries that have
> streambufs and all that other stuff, not about whether they have
> internal storage or not)
>
> - David



More information about the llvm-commits mailing list