Introduce StringBuilder utility around raw_string_ostream

Alp Toker alp at nuanti.com
Mon Jun 30 14:24:56 PDT 2014


On 27/06/2014 10:06, Yaron Keren wrote:
> The LLVM ADTs has several common "themes" or  preferences
>
> 1) Efficient by default, efforts to avoid or minimize heap allocation 
> - even at the cost of making debugging harder: pointers are not 
> pointers but packed with bits or shared with other pointers, thus not 
> the correct type, StringRef are not null terminated, etc. (We  have 
> visualizers to help.)
> 2) Easy to use by default, compatible with the STL containers, simply 
> work.
> 3) Open to "advanced" customization of 1) and 2) via template 
> arguments if required.
>
> Wether these features should be introduced in additional 
> raw_string_ostream functionality or a new string_ostream may be the 
> result of what's technically possible such that it
>
> 1) Uses a non-heap container (SmallString) by default.
> 2) Is able to use other containers (std::string?) for clients who 
> provide it.
> 3) Easy to use = standard string/ostream semantics where possible.
> 4) Hard to misuse = hard/impossible to get into invalid/unsynchronized 
> state, does not require repeating code patterns.
>

Hi Yaron,

Thanks for articulating that. It's a great observation that the best 
approach for string handling will be a synthesis of ADT's common themes.

As far as I can tell it's technically not possible to cover a1,2,3 or 
b1,3,4 by extending raw_string_ostream, while I'm pretty sure we've got 
it all covered with the proposed string_ostream, especially the fact 
that we're now converging raw_svector_ostream and raw_string_ostream 
uses around a single type.

David, ping?

Alp.


> Yaron
>
>
>
> 2014-06-27 4:20 GMT+03:00 Alp Toker <alp at nuanti.com 
> <mailto:alp at nuanti.com>>:
>
>
>     On 27/06/2014 02:50, David Blaikie wrote:
>
>         On Thu, Jun 26, 2014 at 4:05 PM, Alp Toker <alp at nuanti.com
>         <mailto:alp at nuanti.com>> wrote:
>
>             On 27/06/2014 01:42, David Blaikie wrote:
>
>                 On Thu, Jun 26, 2014 at 3:37 PM, Alp Toker
>                 <alp at nuanti.com <mailto:alp at nuanti.com>> wrote:
>
>                     On 27/06/2014 00:48, David Blaikie wrote:
>
>                         On Wed, Jun 25, 2014 at 9:53 PM, David Blaikie
>                         <dblaikie at gmail.com <mailto:dblaikie at gmail.com>>
>                         wrote:
>
>                             On Wed, Jun 25, 2014 at 5:52 PM, Alp Toker
>                             <alp at nuanti.com <mailto:alp at nuanti.com>>
>                             wrote:
>
>                                 On 26/06/2014 03:16, Chandler Carruth
>                                 wrote:
>
>                                     On Thu, Jun 26, 2014 at 2:09 AM,
>                                     Alp Toker <alp at nuanti.com
>                                     <mailto:alp at nuanti.com>
>                                     <mailto:alp at nuanti.com
>                                     <mailto:alp at nuanti.com>>> wrote:
>
>                                            I've made that change, plus
>                                     a couple more updated uses, and
>                                     landed
>                                            this in r211749.
>
>                             FWIW, perhaps I wasn't clear - but my "I'd
>                             like to loop back around
>                             with Chandler to nail down the design
>                             choice" was "I'm not signing off
>                             on this patch without Chandler", so the
>                             commit may've been a bit
>                             premature.
>
>                         Given this confusion, I think appropriate to
>                         back this patch out until
>                         it's actually been signed off on.
>
>                         Sorry for my vague language/lack of clarity
>                         over incremental review
>                         compared to sign-off.
>
>
>                     I don't agree there was any problem with review
>                     here. The patch got its
>                     OKs
>                     and looks greats,
>
>                 You asked me if it was OK to land. I gave some
>                 feedback and said I'd
>                 like to wait. You interpreted that as sign-off. I did
>                 not intend it as
>                 such. We seem to have failed to communicate and I'm
>                 trying to clarify
>                 what I intended to communicate.
>
>                 I'm sorry for the miscommunication, but I really
>                 didn't mean to give
>                 you the impression that this was ready to commit.
>                 Please revert it.
>
>
>             There's obviously no way to revert this without losing the
>             bug fixes.
>
>         Right - they can all be included together in whatever form this is
>         recommitted. No worries about that. (happens all the time here
>         - given
>         we prefer fine grained commits (ideally so that sometimes they
>         can be
>         reverted separately, or at least examined separately when looking
>         through commit history, etc) so sometimes a set of related patches
>         have to be reverted together. Not a problem)
>
>             I've backed them out together temporarily in r211814.
>
>         Thanks!
>
>             Could you state any concerns you have clearly?
>
>         Sure - sorry, just didn't want to confuse code review protocol
>         with
>         the specific merits or caveats of this patch. Not even a
>         question of
>         whether I had concerns or not, just that I hadn't actually
>         signed off
>         on the patch. A simple misunderstanding. I'll try to be more
>         clear in
>         the future.
>
>             A critical feature of the patch is guaranteed safety.
>             Where you see a
>             "string_ostream" type you get an assurance that the string
>             building
>             operations are valid.
>
>         The difference here would just be "Where you see string_ostream
>         default constructed, you get an assurance that the string building
>         operations are valid", right? (it's just "if you use this
>         constructor
>         that takes an externally-provided buffer, buyer beware" - tihs
>         is true
>         of many types (eg: std::unique_ptr's raw ctor, compared to
>         something
>         like make_unique)).
>
>
>     What you're suggesting would be equivalent to having
>     std::unique_ptr default-construct the pointee type which I'm
>     pretty sure it doesn't do?
>
>     By having a separate type, we get to document the semantics
>     clearly, and search by type. We also get to hide the flush()
>     function directly via the type system.
>
>     In your scheme that would end up tucked away in the documentation:
>
>       // If you constructed with X, then behaviour is Y, otherwise
>     behaviour is Z.
>
>     And even after documenting, I'm sure people would still feel
>     compelled to flush() before str() occasionally. With the proposed
>     patch the compiler issues a helpful diagnostic instead.
>
>     (Note that std::stringstream *does* take a dual approach, and
>     suffers from consistency as a result -- 66 open question on '
>     stringstream' on StackOverflow at time of writing is high for a
>     single utility. This isn't really part of the standard C++ library
>     we want to emulate in LLVM -- we're much more conscientious about
>     our strings in general, and shouldn't drop the ball here.)
>
>
>             Those assurances apply even when the builder is passed by
>             reference. You can
>             see the clarity this adds for example in
>             LLVMCreateMessage() and a few other
>             places on ToT.
>
>         The cleanup in LLVMCreateMessage and callers seems almost entirely
>         orthogonal to your new type. The same API change could be made
>         even
>         with the old code:
>
>            string s;
>            raw_string_ostream stream(s);
>            ...
>            return LLVMCreateMessage(stream);
>
>
>     Actually I don't think the form you suggest above particularly
>     safe. I'm glad you raised this as I chose the example intentionally.
>
>     It's unconventional for a function to modify local variables in
>     parent stack frames that weren't explicitly named in the argument
>     list whichever way you bake it.
>
>     No other code really does this in the LLVM source tree, and
>     although you and I immediately grok what's going on as C++
>     frontend developers, it's actually subtle and easy to miss. The
>     programming errors on ToT show just how easy it is to lose the
>     plot with that pattern.
>
>     So my change here wasn't orthogonal -- in fact it's completely
>     aligned with and dependent upon the new safety guarantees
>     introduced in the patch. We absolutely need representation in the
>     type system so that we can enforce correctness.
>
>
>         LLVMCreateMessage is still safe, clear, and correct using
>         either API
>         because str(), even on raw_string_ostream as it is today, is still
>         well defined to flush and then give access to the string.
>
>         Or, with a single type, it'd just be:
>
>            raw_string_ostream stream;
>
>         instead of
>
>            string_ostream stream;
>
>         Or is there some other clarity benefit you were referring to?
>
>         (with the same LLVMCreateMessage change you made (though I
>         personally
>         would have LLVMCreateMessage take a const std::string& - it
>         seems more
>         appropriate for a variety of callers, even if the current
>         callers all
>         happen to have raw_string_ostreams at the moment - indeed, perhaps
>         some of them should just be using string concatenation anyway))
>
>             Importantly, there is no locality of construction so *the
>             type becomes the
>             documentation*.
>
>         Generally it's only important to know this distinction in the
>         scope
>         where the type is declared (because it's that scope where the
>         external
>         storage is provided) - once the stream is passed to some other
>         API,
>         the only access to the string is likely going to be through
>         raw_string_ostream's "str()" member, which does the right thing.
>
>
>     See above.
>
>
>
>             If we were to lump storage into raw_string_ostream, this
>             strong safety
>             indicated by the type system would disappear along with
>             all the heap
>             savings.
>
>         Heap savings can be reacquired where /really/ necessary by using
>         external storage, if necessary. By building in storage, that would
>         become the exception rather than the rule, and something we
>         could more
>         closely scrutinize during code review.
>
>             (which simply aren't negligible compared to some of the
>             effort going into
>             bit packing elsewhere in the codebase).
>
>         Apples and oranges, I suspect (& wouldn't mind some evidence
>         to the
>         contrary before we proliferate more choices in data structure
>         type on
>         the codebase - we have a lot, many for good reason, but there is a
>         cognitive/code review/etc load we pay by adding more). The bit
>         packing
>         sort of stuff happens in data structures that there are often
>         thousands upon thousands of in a single run, or at a single
>         point in
>         time so they have a significant impact on peak memory usage. I
>         doubt
>         we have more than a handful of raw_string_ostreams in existence
>         simultaneously.
>
>
>     I'd like numbers too, but from visual inspection we can already
>     see that this introduces some worthwhile savings.
>
>     string_ostream basically just encourages and simplifies our best
>     practice of using SmallString<> on the stack, and that's never
>     been in doubt.
>
>     In the patch you can see we've fixed truncation issues in code
>     that was previously storing in an external SmallString<>, not just
>     std::string. Chandler's proposed raw_string_ostream wouldn't be a
>     suitable replacement for those uses, missing about 50% of the
>     potential fixes where we want to use this, particularly in clang
>     and LLVM's IR facility. In my patch, we cover the lot with a
>     single type.
>
>     I saw you discussing how to avoid heap-allocation of a *singleton*
>     recently on this list so I know you care at least a little about
>     this matter :-)
>
>
>             If we mix semantics in the base, we'd basically just end
>             up a raw string
>             ostream that sometimes isn't raw,  even making the name of
>             the type a
>             little dubious.
>
>         (On naming: as Chandler mentioned, these are all raw streams -
>         even
>         the new ones you're proposing, the "raw"ness is intended to
>         distinguish them from the C++ Standard stream libraries that have
>         streambufs and all that other stuff, not about whether they have
>         internal storage or not)
>
>
>     Bike shed. I'm happy with what's in the patch, but still open to
>     better names.
>
>     Does this go some way to covering your remaining concerns?
>
>     Alp.
>
>
>
>         - David
>
>
>     -- 
>     http://www.nuanti.com
>     the browser experts
>
>     _______________________________________________
>     llvm-commits mailing list
>     llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>     http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>

-- 
http://www.nuanti.com
the browser experts




More information about the llvm-commits mailing list