Introduce StringBuilder utility around raw_string_ostream

Alp Toker alp at nuanti.com
Thu Jun 26 18:20:05 PDT 2014


On 27/06/2014 02:50, David Blaikie wrote:
> On Thu, Jun 26, 2014 at 4:05 PM, Alp Toker <alp at nuanti.com> wrote:
>> On 27/06/2014 01:42, David Blaikie wrote:
>>> On Thu, Jun 26, 2014 at 3:37 PM, Alp Toker <alp at nuanti.com> wrote:
>>>> On 27/06/2014 00:48, David Blaikie wrote:
>>>>> On Wed, Jun 25, 2014 at 9:53 PM, David Blaikie <dblaikie at gmail.com>
>>>>> wrote:
>>>>>> On Wed, Jun 25, 2014 at 5:52 PM, Alp Toker <alp at nuanti.com> wrote:
>>>>>>> On 26/06/2014 03:16, Chandler Carruth wrote:
>>>>>>>
>>>>>>>> On Thu, Jun 26, 2014 at 2:09 AM, Alp Toker <alp at nuanti.com
>>>>>>>> <mailto:alp at nuanti.com>> wrote:
>>>>>>>>
>>>>>>>>        I've made that change, plus a couple more updated uses, and
>>>>>>>> landed
>>>>>>>>        this in r211749.
>>>>>> FWIW, perhaps I wasn't clear - but my "I'd like to loop back around
>>>>>> with Chandler to nail down the design choice" was "I'm not signing off
>>>>>> on this patch without Chandler", so the commit may've been a bit
>>>>>> premature.
>>>>> Given this confusion, I think appropriate to back this patch out until
>>>>> it's actually been signed off on.
>>>>>
>>>>> Sorry for my vague language/lack of clarity over incremental review
>>>>> compared to sign-off.
>>>>
>>>> I don't agree there was any problem with review here. The patch got its
>>>> OKs
>>>> and looks greats,
>>> You asked me if it was OK to land. I gave some feedback and said I'd
>>> like to wait. You interpreted that as sign-off. I did not intend it as
>>> such. We seem to have failed to communicate and I'm trying to clarify
>>> what I intended to communicate.
>>>
>>> I'm sorry for the miscommunication, but I really didn't mean to give
>>> you the impression that this was ready to commit. Please revert it.
>>
>> There's obviously no way to revert this without losing the bug fixes.
> Right - they can all be included together in whatever form this is
> recommitted. No worries about that. (happens all the time here - given
> we prefer fine grained commits (ideally so that sometimes they can be
> reverted separately, or at least examined separately when looking
> through commit history, etc) so sometimes a set of related patches
> have to be reverted together. Not a problem)
>
>> I've backed them out together temporarily in r211814.
> Thanks!
>
>> Could you state any concerns you have clearly?
> Sure - sorry, just didn't want to confuse code review protocol with
> the specific merits or caveats of this patch. Not even a question of
> whether I had concerns or not, just that I hadn't actually signed off
> on the patch. A simple misunderstanding. I'll try to be more clear in
> the future.
>
>> A critical feature of the patch is guaranteed safety. Where you see a
>> "string_ostream" type you get an assurance that the string building
>> operations are valid.
> The difference here would just be "Where you see string_ostream
> default constructed, you get an assurance that the string building
> operations are valid", right? (it's just "if you use this constructor
> that takes an externally-provided buffer, buyer beware" - tihs is true
> of many types (eg: std::unique_ptr's raw ctor, compared to something
> like make_unique)).

What you're suggesting would be equivalent to having std::unique_ptr 
default-construct the pointee type which I'm pretty sure it doesn't do?

By having a separate type, we get to document the semantics clearly, and 
search by type. We also get to hide the flush() function directly via 
the type system.

In your scheme that would end up tucked away in the documentation:

   // If you constructed with X, then behaviour is Y, otherwise 
behaviour is Z.

And even after documenting, I'm sure people would still feel compelled 
to flush() before str() occasionally. With the proposed patch the 
compiler issues a helpful diagnostic instead.

(Note that std::stringstream *does* take a dual approach, and suffers 
from consistency as a result -- 66 open question on ' stringstream' on 
StackOverflow at time of writing is high for a single utility. This 
isn't really part of the standard C++ library we want to emulate in LLVM 
-- we're much more conscientious about our strings in general, and 
shouldn't drop the ball here.)

>> Those assurances apply even when the builder is passed by reference. You can
>> see the clarity this adds for example in LLVMCreateMessage() and a few other
>> places on ToT.
> The cleanup in LLVMCreateMessage and callers seems almost entirely
> orthogonal to your new type. The same API change could be made even
> with the old code:
>
>    string s;
>    raw_string_ostream stream(s);
>    ...
>    return LLVMCreateMessage(stream);

Actually I don't think the form you suggest above particularly safe. I'm 
glad you raised this as I chose the example intentionally.

It's unconventional for a function to modify local variables in parent 
stack frames that weren't explicitly named in the argument list 
whichever way you bake it.

No other code really does this in the LLVM source tree, and although you 
and I immediately grok what's going on as C++ frontend developers, it's 
actually subtle and easy to miss. The programming errors on ToT show 
just how easy it is to lose the plot with that pattern.

So my change here wasn't orthogonal -- in fact it's completely aligned 
with and dependent upon the new safety guarantees introduced in the 
patch. We absolutely need representation in the type system so that we 
can enforce correctness.

> LLVMCreateMessage is still safe, clear, and correct using either API
> because str(), even on raw_string_ostream as it is today, is still
> well defined to flush and then give access to the string.
>
> Or, with a single type, it'd just be:
>
>    raw_string_ostream stream;
>
> instead of
>
>    string_ostream stream;
>
> Or is there some other clarity benefit you were referring to?
>
> (with the same LLVMCreateMessage change you made (though I personally
> would have LLVMCreateMessage take a const std::string& - it seems more
> appropriate for a variety of callers, even if the current callers all
> happen to have raw_string_ostreams at the moment - indeed, perhaps
> some of them should just be using string concatenation anyway))
>
>> Importantly, there is no locality of construction so *the type becomes the
>> documentation*.
> Generally it's only important to know this distinction in the scope
> where the type is declared (because it's that scope where the external
> storage is provided) - once the stream is passed to some other API,
> the only access to the string is likely going to be through
> raw_string_ostream's "str()" member, which does the right thing.

See above.

>
>> If we were to lump storage into raw_string_ostream, this strong safety
>> indicated by the type system would disappear along with all the heap
>> savings.
> Heap savings can be reacquired where /really/ necessary by using
> external storage, if necessary. By building in storage, that would
> become the exception rather than the rule, and something we could more
> closely scrutinize during code review.
>
>> (which simply aren't negligible compared to some of the effort going into
>> bit packing elsewhere in the codebase).
> Apples and oranges, I suspect (& wouldn't mind some evidence to the
> contrary before we proliferate more choices in data structure type on
> the codebase - we have a lot, many for good reason, but there is a
> cognitive/code review/etc load we pay by adding more). The bit packing
> sort of stuff happens in data structures that there are often
> thousands upon thousands of in a single run, or at a single point in
> time so they have a significant impact on peak memory usage. I doubt
> we have more than a handful of raw_string_ostreams in existence
> simultaneously.

I'd like numbers too, but from visual inspection we can already see that 
this introduces some worthwhile savings.

string_ostream basically just encourages and simplifies our best 
practice of using SmallString<> on the stack, and that's never been in 
doubt.

In the patch you can see we've fixed truncation issues in code that was 
previously storing in an external SmallString<>, not just std::string. 
Chandler's proposed raw_string_ostream wouldn't be a suitable 
replacement for those uses, missing about 50% of the potential fixes 
where we want to use this, particularly in clang and LLVM's IR facility. 
In my patch, we cover the lot with a single type.

I saw you discussing how to avoid heap-allocation of a *singleton* 
recently on this list so I know you care at least a little about this 
matter :-)

>> If we mix semantics in the base, we'd basically just end up a raw string
>> ostream that sometimes isn't raw,  even making the name of the type a
>> little dubious.
> (On naming: as Chandler mentioned, these are all raw streams - even
> the new ones you're proposing, the "raw"ness is intended to
> distinguish them from the C++ Standard stream libraries that have
> streambufs and all that other stuff, not about whether they have
> internal storage or not)

Bike shed. I'm happy with what's in the patch, but still open to better 
names.

Does this go some way to covering your remaining concerns?

Alp.


>
> - David

-- 
http://www.nuanti.com
the browser experts




More information about the llvm-commits mailing list