Introduce StringBuilder utility around raw_string_ostream
Yaron Keren
yaron.keren at gmail.com
Fri Jun 27 00:06:09 PDT 2014
The LLVM ADTs has several common "themes" or preferences
1) Efficient by default, efforts to avoid or minimize heap allocation -
even at the cost of making debugging harder: pointers are not pointers but
packed with bits or shared with other pointers, thus not the correct type,
StringRef are not null terminated, etc. (We have visualizers to help.)
2) Easy to use by default, compatible with the STL containers, simply work.
3) Open to "advanced" customization of 1) and 2) via template arguments if
required.
Wether these features should be introduced in additional raw_string_ostream
functionality or a new string_ostream may be the result of what's
technically possible such that it
1) Uses a non-heap container (SmallString) by default.
2) Is able to use other containers (std::string?) for clients who provide
it.
3) Easy to use = standard string/ostream semantics where possible.
4) Hard to misuse = hard/impossible to get into invalid/unsynchronized
state, does not require repeating code patterns.
Yaron
2014-06-27 4:20 GMT+03:00 Alp Toker <alp at nuanti.com>:
>
> On 27/06/2014 02:50, David Blaikie wrote:
>
>> On Thu, Jun 26, 2014 at 4:05 PM, Alp Toker <alp at nuanti.com> wrote:
>>
>>> On 27/06/2014 01:42, David Blaikie wrote:
>>>
>>>> On Thu, Jun 26, 2014 at 3:37 PM, Alp Toker <alp at nuanti.com> wrote:
>>>>
>>>>> On 27/06/2014 00:48, David Blaikie wrote:
>>>>>
>>>>>> On Wed, Jun 25, 2014 at 9:53 PM, David Blaikie <dblaikie at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> On Wed, Jun 25, 2014 at 5:52 PM, Alp Toker <alp at nuanti.com> wrote:
>>>>>>>
>>>>>>>> On 26/06/2014 03:16, Chandler Carruth wrote:
>>>>>>>>
>>>>>>>> On Thu, Jun 26, 2014 at 2:09 AM, Alp Toker <alp at nuanti.com
>>>>>>>>> <mailto:alp at nuanti.com>> wrote:
>>>>>>>>>
>>>>>>>>> I've made that change, plus a couple more updated uses, and
>>>>>>>>> landed
>>>>>>>>> this in r211749.
>>>>>>>>>
>>>>>>>> FWIW, perhaps I wasn't clear - but my "I'd like to loop back around
>>>>>>> with Chandler to nail down the design choice" was "I'm not signing
>>>>>>> off
>>>>>>> on this patch without Chandler", so the commit may've been a bit
>>>>>>> premature.
>>>>>>>
>>>>>> Given this confusion, I think appropriate to back this patch out until
>>>>>> it's actually been signed off on.
>>>>>>
>>>>>> Sorry for my vague language/lack of clarity over incremental review
>>>>>> compared to sign-off.
>>>>>>
>>>>>
>>>>> I don't agree there was any problem with review here. The patch got its
>>>>> OKs
>>>>> and looks greats,
>>>>>
>>>> You asked me if it was OK to land. I gave some feedback and said I'd
>>>> like to wait. You interpreted that as sign-off. I did not intend it as
>>>> such. We seem to have failed to communicate and I'm trying to clarify
>>>> what I intended to communicate.
>>>>
>>>> I'm sorry for the miscommunication, but I really didn't mean to give
>>>> you the impression that this was ready to commit. Please revert it.
>>>>
>>>
>>> There's obviously no way to revert this without losing the bug fixes.
>>>
>> Right - they can all be included together in whatever form this is
>> recommitted. No worries about that. (happens all the time here - given
>> we prefer fine grained commits (ideally so that sometimes they can be
>> reverted separately, or at least examined separately when looking
>> through commit history, etc) so sometimes a set of related patches
>> have to be reverted together. Not a problem)
>>
>> I've backed them out together temporarily in r211814.
>>>
>> Thanks!
>>
>> Could you state any concerns you have clearly?
>>>
>> Sure - sorry, just didn't want to confuse code review protocol with
>> the specific merits or caveats of this patch. Not even a question of
>> whether I had concerns or not, just that I hadn't actually signed off
>> on the patch. A simple misunderstanding. I'll try to be more clear in
>> the future.
>>
>> A critical feature of the patch is guaranteed safety. Where you see a
>>> "string_ostream" type you get an assurance that the string building
>>> operations are valid.
>>>
>> The difference here would just be "Where you see string_ostream
>> default constructed, you get an assurance that the string building
>> operations are valid", right? (it's just "if you use this constructor
>> that takes an externally-provided buffer, buyer beware" - tihs is true
>> of many types (eg: std::unique_ptr's raw ctor, compared to something
>> like make_unique)).
>>
>
> What you're suggesting would be equivalent to having std::unique_ptr
> default-construct the pointee type which I'm pretty sure it doesn't do?
>
> By having a separate type, we get to document the semantics clearly, and
> search by type. We also get to hide the flush() function directly via the
> type system.
>
> In your scheme that would end up tucked away in the documentation:
>
> // If you constructed with X, then behaviour is Y, otherwise behaviour
> is Z.
>
> And even after documenting, I'm sure people would still feel compelled to
> flush() before str() occasionally. With the proposed patch the compiler
> issues a helpful diagnostic instead.
>
> (Note that std::stringstream *does* take a dual approach, and suffers from
> consistency as a result -- 66 open question on ' stringstream' on
> StackOverflow at time of writing is high for a single utility. This isn't
> really part of the standard C++ library we want to emulate in LLVM -- we're
> much more conscientious about our strings in general, and shouldn't drop
> the ball here.)
>
>
> Those assurances apply even when the builder is passed by reference. You
>>> can
>>> see the clarity this adds for example in LLVMCreateMessage() and a few
>>> other
>>> places on ToT.
>>>
>> The cleanup in LLVMCreateMessage and callers seems almost entirely
>> orthogonal to your new type. The same API change could be made even
>> with the old code:
>>
>> string s;
>> raw_string_ostream stream(s);
>> ...
>> return LLVMCreateMessage(stream);
>>
>
> Actually I don't think the form you suggest above particularly safe. I'm
> glad you raised this as I chose the example intentionally.
>
> It's unconventional for a function to modify local variables in parent
> stack frames that weren't explicitly named in the argument list whichever
> way you bake it.
>
> No other code really does this in the LLVM source tree, and although you
> and I immediately grok what's going on as C++ frontend developers, it's
> actually subtle and easy to miss. The programming errors on ToT show just
> how easy it is to lose the plot with that pattern.
>
> So my change here wasn't orthogonal -- in fact it's completely aligned
> with and dependent upon the new safety guarantees introduced in the patch.
> We absolutely need representation in the type system so that we can enforce
> correctness.
>
>
> LLVMCreateMessage is still safe, clear, and correct using either API
>> because str(), even on raw_string_ostream as it is today, is still
>> well defined to flush and then give access to the string.
>>
>> Or, with a single type, it'd just be:
>>
>> raw_string_ostream stream;
>>
>> instead of
>>
>> string_ostream stream;
>>
>> Or is there some other clarity benefit you were referring to?
>>
>> (with the same LLVMCreateMessage change you made (though I personally
>> would have LLVMCreateMessage take a const std::string& - it seems more
>> appropriate for a variety of callers, even if the current callers all
>> happen to have raw_string_ostreams at the moment - indeed, perhaps
>> some of them should just be using string concatenation anyway))
>>
>> Importantly, there is no locality of construction so *the type becomes
>>> the
>>> documentation*.
>>>
>> Generally it's only important to know this distinction in the scope
>> where the type is declared (because it's that scope where the external
>> storage is provided) - once the stream is passed to some other API,
>> the only access to the string is likely going to be through
>> raw_string_ostream's "str()" member, which does the right thing.
>>
>
> See above.
>
>
>
>> If we were to lump storage into raw_string_ostream, this strong safety
>>> indicated by the type system would disappear along with all the heap
>>> savings.
>>>
>> Heap savings can be reacquired where /really/ necessary by using
>> external storage, if necessary. By building in storage, that would
>> become the exception rather than the rule, and something we could more
>> closely scrutinize during code review.
>>
>> (which simply aren't negligible compared to some of the effort going into
>>> bit packing elsewhere in the codebase).
>>>
>> Apples and oranges, I suspect (& wouldn't mind some evidence to the
>> contrary before we proliferate more choices in data structure type on
>> the codebase - we have a lot, many for good reason, but there is a
>> cognitive/code review/etc load we pay by adding more). The bit packing
>> sort of stuff happens in data structures that there are often
>> thousands upon thousands of in a single run, or at a single point in
>> time so they have a significant impact on peak memory usage. I doubt
>> we have more than a handful of raw_string_ostreams in existence
>> simultaneously.
>>
>
> I'd like numbers too, but from visual inspection we can already see that
> this introduces some worthwhile savings.
>
> string_ostream basically just encourages and simplifies our best practice
> of using SmallString<> on the stack, and that's never been in doubt.
>
> In the patch you can see we've fixed truncation issues in code that was
> previously storing in an external SmallString<>, not just std::string.
> Chandler's proposed raw_string_ostream wouldn't be a suitable replacement
> for those uses, missing about 50% of the potential fixes where we want to
> use this, particularly in clang and LLVM's IR facility. In my patch, we
> cover the lot with a single type.
>
> I saw you discussing how to avoid heap-allocation of a *singleton*
> recently on this list so I know you care at least a little about this
> matter :-)
>
>
> If we mix semantics in the base, we'd basically just end up a raw string
>>> ostream that sometimes isn't raw, even making the name of the type a
>>> little dubious.
>>>
>> (On naming: as Chandler mentioned, these are all raw streams - even
>> the new ones you're proposing, the "raw"ness is intended to
>> distinguish them from the C++ Standard stream libraries that have
>> streambufs and all that other stuff, not about whether they have
>> internal storage or not)
>>
>
> Bike shed. I'm happy with what's in the patch, but still open to better
> names.
>
> Does this go some way to covering your remaining concerns?
>
> Alp.
>
>
>
>> - David
>>
>
> --
> http://www.nuanti.com
> the browser experts
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140627/e387cdd0/attachment.html>
More information about the llvm-commits
mailing list