Introduce StringBuilder utility around raw_string_ostream
Alp Toker
alp at nuanti.com
Mon Jun 30 14:24:56 PDT 2014
On 27/06/2014 10:06, Yaron Keren wrote:
> The LLVM ADTs has several common "themes" or preferences
>
> 1) Efficient by default, efforts to avoid or minimize heap allocation
> - even at the cost of making debugging harder: pointers are not
> pointers but packed with bits or shared with other pointers, thus not
> the correct type, StringRef are not null terminated, etc. (We have
> visualizers to help.)
> 2) Easy to use by default, compatible with the STL containers, simply
> work.
> 3) Open to "advanced" customization of 1) and 2) via template
> arguments if required.
>
> Wether these features should be introduced in additional
> raw_string_ostream functionality or a new string_ostream may be the
> result of what's technically possible such that it
>
> 1) Uses a non-heap container (SmallString) by default.
> 2) Is able to use other containers (std::string?) for clients who
> provide it.
> 3) Easy to use = standard string/ostream semantics where possible.
> 4) Hard to misuse = hard/impossible to get into invalid/unsynchronized
> state, does not require repeating code patterns.
>
Hi Yaron,
Thanks for articulating that. It's a great observation that the best
approach for string handling will be a synthesis of ADT's common themes.
As far as I can tell it's technically not possible to cover a1,2,3 or
b1,3,4 by extending raw_string_ostream, while I'm pretty sure we've got
it all covered with the proposed string_ostream, especially the fact
that we're now converging raw_svector_ostream and raw_string_ostream
uses around a single type.
David, ping?
Alp.
> Yaron
>
>
>
> 2014-06-27 4:20 GMT+03:00 Alp Toker <alp at nuanti.com
> <mailto:alp at nuanti.com>>:
>
>
> On 27/06/2014 02:50, David Blaikie wrote:
>
> On Thu, Jun 26, 2014 at 4:05 PM, Alp Toker <alp at nuanti.com
> <mailto:alp at nuanti.com>> wrote:
>
> On 27/06/2014 01:42, David Blaikie wrote:
>
> On Thu, Jun 26, 2014 at 3:37 PM, Alp Toker
> <alp at nuanti.com <mailto:alp at nuanti.com>> wrote:
>
> On 27/06/2014 00:48, David Blaikie wrote:
>
> On Wed, Jun 25, 2014 at 9:53 PM, David Blaikie
> <dblaikie at gmail.com <mailto:dblaikie at gmail.com>>
> wrote:
>
> On Wed, Jun 25, 2014 at 5:52 PM, Alp Toker
> <alp at nuanti.com <mailto:alp at nuanti.com>>
> wrote:
>
> On 26/06/2014 03:16, Chandler Carruth
> wrote:
>
> On Thu, Jun 26, 2014 at 2:09 AM,
> Alp Toker <alp at nuanti.com
> <mailto:alp at nuanti.com>
> <mailto:alp at nuanti.com
> <mailto:alp at nuanti.com>>> wrote:
>
> I've made that change, plus
> a couple more updated uses, and
> landed
> this in r211749.
>
> FWIW, perhaps I wasn't clear - but my "I'd
> like to loop back around
> with Chandler to nail down the design
> choice" was "I'm not signing off
> on this patch without Chandler", so the
> commit may've been a bit
> premature.
>
> Given this confusion, I think appropriate to
> back this patch out until
> it's actually been signed off on.
>
> Sorry for my vague language/lack of clarity
> over incremental review
> compared to sign-off.
>
>
> I don't agree there was any problem with review
> here. The patch got its
> OKs
> and looks greats,
>
> You asked me if it was OK to land. I gave some
> feedback and said I'd
> like to wait. You interpreted that as sign-off. I did
> not intend it as
> such. We seem to have failed to communicate and I'm
> trying to clarify
> what I intended to communicate.
>
> I'm sorry for the miscommunication, but I really
> didn't mean to give
> you the impression that this was ready to commit.
> Please revert it.
>
>
> There's obviously no way to revert this without losing the
> bug fixes.
>
> Right - they can all be included together in whatever form this is
> recommitted. No worries about that. (happens all the time here
> - given
> we prefer fine grained commits (ideally so that sometimes they
> can be
> reverted separately, or at least examined separately when looking
> through commit history, etc) so sometimes a set of related patches
> have to be reverted together. Not a problem)
>
> I've backed them out together temporarily in r211814.
>
> Thanks!
>
> Could you state any concerns you have clearly?
>
> Sure - sorry, just didn't want to confuse code review protocol
> with
> the specific merits or caveats of this patch. Not even a
> question of
> whether I had concerns or not, just that I hadn't actually
> signed off
> on the patch. A simple misunderstanding. I'll try to be more
> clear in
> the future.
>
> A critical feature of the patch is guaranteed safety.
> Where you see a
> "string_ostream" type you get an assurance that the string
> building
> operations are valid.
>
> The difference here would just be "Where you see string_ostream
> default constructed, you get an assurance that the string building
> operations are valid", right? (it's just "if you use this
> constructor
> that takes an externally-provided buffer, buyer beware" - tihs
> is true
> of many types (eg: std::unique_ptr's raw ctor, compared to
> something
> like make_unique)).
>
>
> What you're suggesting would be equivalent to having
> std::unique_ptr default-construct the pointee type which I'm
> pretty sure it doesn't do?
>
> By having a separate type, we get to document the semantics
> clearly, and search by type. We also get to hide the flush()
> function directly via the type system.
>
> In your scheme that would end up tucked away in the documentation:
>
> // If you constructed with X, then behaviour is Y, otherwise
> behaviour is Z.
>
> And even after documenting, I'm sure people would still feel
> compelled to flush() before str() occasionally. With the proposed
> patch the compiler issues a helpful diagnostic instead.
>
> (Note that std::stringstream *does* take a dual approach, and
> suffers from consistency as a result -- 66 open question on '
> stringstream' on StackOverflow at time of writing is high for a
> single utility. This isn't really part of the standard C++ library
> we want to emulate in LLVM -- we're much more conscientious about
> our strings in general, and shouldn't drop the ball here.)
>
>
> Those assurances apply even when the builder is passed by
> reference. You can
> see the clarity this adds for example in
> LLVMCreateMessage() and a few other
> places on ToT.
>
> The cleanup in LLVMCreateMessage and callers seems almost entirely
> orthogonal to your new type. The same API change could be made
> even
> with the old code:
>
> string s;
> raw_string_ostream stream(s);
> ...
> return LLVMCreateMessage(stream);
>
>
> Actually I don't think the form you suggest above particularly
> safe. I'm glad you raised this as I chose the example intentionally.
>
> It's unconventional for a function to modify local variables in
> parent stack frames that weren't explicitly named in the argument
> list whichever way you bake it.
>
> No other code really does this in the LLVM source tree, and
> although you and I immediately grok what's going on as C++
> frontend developers, it's actually subtle and easy to miss. The
> programming errors on ToT show just how easy it is to lose the
> plot with that pattern.
>
> So my change here wasn't orthogonal -- in fact it's completely
> aligned with and dependent upon the new safety guarantees
> introduced in the patch. We absolutely need representation in the
> type system so that we can enforce correctness.
>
>
> LLVMCreateMessage is still safe, clear, and correct using
> either API
> because str(), even on raw_string_ostream as it is today, is still
> well defined to flush and then give access to the string.
>
> Or, with a single type, it'd just be:
>
> raw_string_ostream stream;
>
> instead of
>
> string_ostream stream;
>
> Or is there some other clarity benefit you were referring to?
>
> (with the same LLVMCreateMessage change you made (though I
> personally
> would have LLVMCreateMessage take a const std::string& - it
> seems more
> appropriate for a variety of callers, even if the current
> callers all
> happen to have raw_string_ostreams at the moment - indeed, perhaps
> some of them should just be using string concatenation anyway))
>
> Importantly, there is no locality of construction so *the
> type becomes the
> documentation*.
>
> Generally it's only important to know this distinction in the
> scope
> where the type is declared (because it's that scope where the
> external
> storage is provided) - once the stream is passed to some other
> API,
> the only access to the string is likely going to be through
> raw_string_ostream's "str()" member, which does the right thing.
>
>
> See above.
>
>
>
> If we were to lump storage into raw_string_ostream, this
> strong safety
> indicated by the type system would disappear along with
> all the heap
> savings.
>
> Heap savings can be reacquired where /really/ necessary by using
> external storage, if necessary. By building in storage, that would
> become the exception rather than the rule, and something we
> could more
> closely scrutinize during code review.
>
> (which simply aren't negligible compared to some of the
> effort going into
> bit packing elsewhere in the codebase).
>
> Apples and oranges, I suspect (& wouldn't mind some evidence
> to the
> contrary before we proliferate more choices in data structure
> type on
> the codebase - we have a lot, many for good reason, but there is a
> cognitive/code review/etc load we pay by adding more). The bit
> packing
> sort of stuff happens in data structures that there are often
> thousands upon thousands of in a single run, or at a single
> point in
> time so they have a significant impact on peak memory usage. I
> doubt
> we have more than a handful of raw_string_ostreams in existence
> simultaneously.
>
>
> I'd like numbers too, but from visual inspection we can already
> see that this introduces some worthwhile savings.
>
> string_ostream basically just encourages and simplifies our best
> practice of using SmallString<> on the stack, and that's never
> been in doubt.
>
> In the patch you can see we've fixed truncation issues in code
> that was previously storing in an external SmallString<>, not just
> std::string. Chandler's proposed raw_string_ostream wouldn't be a
> suitable replacement for those uses, missing about 50% of the
> potential fixes where we want to use this, particularly in clang
> and LLVM's IR facility. In my patch, we cover the lot with a
> single type.
>
> I saw you discussing how to avoid heap-allocation of a *singleton*
> recently on this list so I know you care at least a little about
> this matter :-)
>
>
> If we mix semantics in the base, we'd basically just end
> up a raw string
> ostream that sometimes isn't raw, even making the name of
> the type a
> little dubious.
>
> (On naming: as Chandler mentioned, these are all raw streams -
> even
> the new ones you're proposing, the "raw"ness is intended to
> distinguish them from the C++ Standard stream libraries that have
> streambufs and all that other stuff, not about whether they have
> internal storage or not)
>
>
> Bike shed. I'm happy with what's in the patch, but still open to
> better names.
>
> Does this go some way to covering your remaining concerns?
>
> Alp.
>
>
>
> - David
>
>
> --
> http://www.nuanti.com
> the browser experts
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>
--
http://www.nuanti.com
the browser experts
More information about the llvm-commits
mailing list