Introduce StringBuilder utility around raw_string_ostream

Alp Toker alp at nuanti.com
Wed Jul 2 08:16:42 PDT 2014


On 01/07/2014 00:24, Alp Toker wrote:
>
> On 27/06/2014 10:06, Yaron Keren wrote:
>> The LLVM ADTs has several common "themes" or  preferences
>>
>> 1) Efficient by default, efforts to avoid or minimize heap allocation 
>> - even at the cost of making debugging harder: pointers are not 
>> pointers but packed with bits or shared with other pointers, thus not 
>> the correct type, StringRef are not null terminated, etc. (We  have 
>> visualizers to help.)
>> 2) Easy to use by default, compatible with the STL containers, simply 
>> work.
>> 3) Open to "advanced" customization of 1) and 2) via template 
>> arguments if required.
>>
>> Wether these features should be introduced in additional 
>> raw_string_ostream functionality or a new string_ostream may be the 
>> result of what's technically possible such that it
>>
>> 1) Uses a non-heap container (SmallString) by default.
>> 2) Is able to use other containers (std::string?) for clients who 
>> provide it.
>> 3) Easy to use = standard string/ostream semantics where possible.
>> 4) Hard to misuse = hard/impossible to get into 
>> invalid/unsynchronized state, does not require repeating code patterns.
>>
>
> Hi Yaron,
>
> Thanks for articulating that. It's a great observation that the best 
> approach for string handling will be a synthesis of ADT's common themes.
>
> As far as I can tell it's technically not possible to cover a1,2,3 or 
> b1,3,4 by extending raw_string_ostream, while I'm pretty sure we've 
> got it all covered with the proposed string_ostream, especially the 
> fact that we're now converging raw_svector_ostream and 
> raw_string_ostream uses around a single type.
>
> David, ping?
>
> Alp.
>
>
>> Yaron
>>
>>
>>
>> 2014-06-27 4:20 GMT+03:00 Alp Toker <alp at nuanti.com 
>> <mailto:alp at nuanti.com>>:
>>
>>
>>     On 27/06/2014 02:50, David Blaikie wrote:
>>
>>         On Thu, Jun 26, 2014 at 4:05 PM, Alp Toker <alp at nuanti.com
>>         <mailto:alp at nuanti.com>> wrote:
>>
>>             On 27/06/2014 01:42, David Blaikie wrote:
>>
>>                 On Thu, Jun 26, 2014 at 3:37 PM, Alp Toker
>>                 <alp at nuanti.com <mailto:alp at nuanti.com>> wrote:
>>
>>                     On 27/06/2014 00:48, David Blaikie wrote:
>>
>>                         On Wed, Jun 25, 2014 at 9:53 PM, David Blaikie
>>                         <dblaikie at gmail.com <mailto:dblaikie at gmail.com>>
>>                         wrote:
>>
>>                             On Wed, Jun 25, 2014 at 5:52 PM, Alp Toker
>>                             <alp at nuanti.com <mailto:alp at nuanti.com>>
>>                             wrote:
>>
>>                                 On 26/06/2014 03:16, Chandler Carruth
>>                                 wrote:
>>
>>                                     On Thu, Jun 26, 2014 at 2:09 AM,
>>                                     Alp Toker <alp at nuanti.com
>> <mailto:alp at nuanti.com>
>>                                     <mailto:alp at nuanti.com
>> <mailto:alp at nuanti.com>>> wrote:
>>
>>                                            I've made that change, plus
>>                                     a couple more updated uses, and
>>                                     landed
>>                                            this in r211749.
>>
>>                             FWIW, perhaps I wasn't clear - but my "I'd
>>                             like to loop back around
>>                             with Chandler to nail down the design
>>                             choice" was "I'm not signing off
>>                             on this patch without Chandler", so the
>>                             commit may've been a bit
>>                             premature.
>>
>>                         Given this confusion, I think appropriate to
>>                         back this patch out until
>>                         it's actually been signed off on.
>>
>>                         Sorry for my vague language/lack of clarity
>>                         over incremental review
>>                         compared to sign-off.
>>
>>
>>                     I don't agree there was any problem with review
>>                     here. The patch got its
>>                     OKs
>>                     and looks greats,
>>
>>                 You asked me if it was OK to land. I gave some
>>                 feedback and said I'd
>>                 like to wait. You interpreted that as sign-off. I did
>>                 not intend it as
>>                 such. We seem to have failed to communicate and I'm
>>                 trying to clarify
>>                 what I intended to communicate.
>>
>>                 I'm sorry for the miscommunication, but I really
>>                 didn't mean to give
>>                 you the impression that this was ready to commit.
>>                 Please revert it.
>>
>>
>>             There's obviously no way to revert this without losing the
>>             bug fixes.
>>
>>         Right - they can all be included together in whatever form 
>> this is
>>         recommitted. No worries about that. (happens all the time here
>>         - given
>>         we prefer fine grained commits (ideally so that sometimes they
>>         can be
>>         reverted separately, or at least examined separately when 
>> looking
>>         through commit history, etc) so sometimes a set of related 
>> patches
>>         have to be reverted together. Not a problem)
>>
>>             I've backed them out together temporarily in r211814.
>>
>>         Thanks!
>>
>>             Could you state any concerns you have clearly?
>>
>>         Sure - sorry, just didn't want to confuse code review protocol
>>         with
>>         the specific merits or caveats of this patch. Not even a
>>         question of
>>         whether I had concerns or not, just that I hadn't actually
>>         signed off
>>         on the patch. A simple misunderstanding. I'll try to be more
>>         clear in
>>         the future.
>>
>>             A critical feature of the patch is guaranteed safety.
>>             Where you see a
>>             "string_ostream" type you get an assurance that the string
>>             building
>>             operations are valid.
>>
>>         The difference here would just be "Where you see string_ostream
>>         default constructed, you get an assurance that the string 
>> building
>>         operations are valid", right? (it's just "if you use this
>>         constructor
>>         that takes an externally-provided buffer, buyer beware" - tihs
>>         is true
>>         of many types (eg: std::unique_ptr's raw ctor, compared to
>>         something
>>         like make_unique)).
>>
>>
>>     What you're suggesting would be equivalent to having
>>     std::unique_ptr default-construct the pointee type which I'm
>>     pretty sure it doesn't do?
>>
>>     By having a separate type, we get to document the semantics
>>     clearly, and search by type. We also get to hide the flush()
>>     function directly via the type system.
>>
>>     In your scheme that would end up tucked away in the documentation:
>>
>>       // If you constructed with X, then behaviour is Y, otherwise
>>     behaviour is Z.
>>
>>     And even after documenting, I'm sure people would still feel
>>     compelled to flush() before str() occasionally. With the proposed
>>     patch the compiler issues a helpful diagnostic instead.
>>
>>     (Note that std::stringstream *does* take a dual approach, and
>>     suffers from consistency as a result -- 66 open question on '
>>     stringstream' on StackOverflow at time of writing is high for a
>>     single utility. This isn't really part of the standard C++ library
>>     we want to emulate in LLVM -- we're much more conscientious about
>>     our strings in general, and shouldn't drop the ball here.)
>>
>>
>>             Those assurances apply even when the builder is passed by
>>             reference. You can
>>             see the clarity this adds for example in
>>             LLVMCreateMessage() and a few other
>>             places on ToT.
>>
>>         The cleanup in LLVMCreateMessage and callers seems almost 
>> entirely
>>         orthogonal to your new type. The same API change could be made
>>         even
>>         with the old code:
>>
>>            string s;
>>            raw_string_ostream stream(s);
>>            ...
>>            return LLVMCreateMessage(stream);
>>
>>
>>     Actually I don't think the form you suggest above particularly
>>     safe. I'm glad you raised this as I chose the example intentionally.
>>
>>     It's unconventional for a function to modify local variables in
>>     parent stack frames that weren't explicitly named in the argument
>>     list whichever way you bake it.
>>
>>     No other code really does this in the LLVM source tree, and
>>     although you and I immediately grok what's going on as C++
>>     frontend developers, it's actually subtle and easy to miss. The
>>     programming errors on ToT show just how easy it is to lose the
>>     plot with that pattern.
>>
>>     So my change here wasn't orthogonal -- in fact it's completely
>>     aligned with and dependent upon the new safety guarantees
>>     introduced in the patch. We absolutely need representation in the
>>     type system so that we can enforce correctness.
>>
>>
>>         LLVMCreateMessage is still safe, clear, and correct using
>>         either API
>>         because str(), even on raw_string_ostream as it is today, is 
>> still
>>         well defined to flush and then give access to the string.
>>
>>         Or, with a single type, it'd just be:
>>
>>            raw_string_ostream stream;
>>
>>         instead of
>>
>>            string_ostream stream;
>>
>>         Or is there some other clarity benefit you were referring to?
>>
>>         (with the same LLVMCreateMessage change you made (though I
>>         personally
>>         would have LLVMCreateMessage take a const std::string& - it
>>         seems more
>>         appropriate for a variety of callers, even if the current
>>         callers all
>>         happen to have raw_string_ostreams at the moment - indeed, 
>> perhaps
>>         some of them should just be using string concatenation anyway))
>>
>>             Importantly, there is no locality of construction so *the
>>             type becomes the
>>             documentation*.
>>
>>         Generally it's only important to know this distinction in the
>>         scope
>>         where the type is declared (because it's that scope where the
>>         external
>>         storage is provided) - once the stream is passed to some other
>>         API,
>>         the only access to the string is likely going to be through
>>         raw_string_ostream's "str()" member, which does the right thing.
>>
>>
>>     See above.
>>
>>
>>
>>             If we were to lump storage into raw_string_ostream, this
>>             strong safety
>>             indicated by the type system would disappear along with
>>             all the heap
>>             savings.
>>
>>         Heap savings can be reacquired where /really/ necessary by using
>>         external storage, if necessary. By building in storage, that 
>> would
>>         become the exception rather than the rule, and something we
>>         could more
>>         closely scrutinize during code review.
>>
>>             (which simply aren't negligible compared to some of the
>>             effort going into
>>             bit packing elsewhere in the codebase).
>>
>>         Apples and oranges, I suspect (& wouldn't mind some evidence
>>         to the
>>         contrary before we proliferate more choices in data structure
>>         type on
>>         the codebase - we have a lot, many for good reason, but there 
>> is a
>>         cognitive/code review/etc load we pay by adding more). The bit
>>         packing
>>         sort of stuff happens in data structures that there are often
>>         thousands upon thousands of in a single run, or at a single
>>         point in
>>         time so they have a significant impact on peak memory usage. I
>>         doubt
>>         we have more than a handful of raw_string_ostreams in existence
>>         simultaneously.
>>
>>
>>     I'd like numbers too, but from visual inspection we can already
>>     see that this introduces some worthwhile savings.
>>
>>     string_ostream basically just encourages and simplifies our best
>>     practice of using SmallString<> on the stack, and that's never
>>     been in doubt.
>>
>>     In the patch you can see we've fixed truncation issues in code
>>     that was previously storing in an external SmallString<>, not just
>>     std::string. Chandler's proposed raw_string_ostream wouldn't be a
>>     suitable replacement for those uses, missing about 50% of the
>>     potential fixes where we want to use this, particularly in clang
>>     and LLVM's IR facility. In my patch, we cover the lot with a
>>     single type.
>>
>>     I saw you discussing how to avoid heap-allocation of a *singleton*
>>     recently on this list so I know you care at least a little about
>>     this matter :-)
>>
>>
>>             If we mix semantics in the base, we'd basically just end
>>             up a raw string
>>             ostream that sometimes isn't raw,  even making the name of
>>             the type a
>>             little dubious.
>>
>>         (On naming: as Chandler mentioned, these are all raw streams -
>>         even
>>         the new ones you're proposing, the "raw"ness is intended to
>>         distinguish them from the C++ Standard stream libraries that 
>> have
>>         streambufs and all that other stuff, not about whether they have
>>         internal storage or not)
>>
>>
>>     Bike shed. I'm happy with what's in the patch, but still open to
>>     better names.
>>
>>     Does this go some way to covering your remaining concerns?

Ping


>>
>>     Alp.
>>
>>
>>
>>         - David
>>
>>
>>     --     http://www.nuanti.com
>>     the browser experts
>>
>>     _______________________________________________
>>     llvm-commits mailing list
>>     llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>>     http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>>
>

-- 
http://www.nuanti.com
the browser experts




More information about the llvm-commits mailing list