Introduce StringBuilder utility around raw_string_ostream
Alp Toker
alp at nuanti.com
Wed Jul 2 08:16:42 PDT 2014
On 01/07/2014 00:24, Alp Toker wrote:
>
> On 27/06/2014 10:06, Yaron Keren wrote:
>> The LLVM ADTs has several common "themes" or preferences
>>
>> 1) Efficient by default, efforts to avoid or minimize heap allocation
>> - even at the cost of making debugging harder: pointers are not
>> pointers but packed with bits or shared with other pointers, thus not
>> the correct type, StringRef are not null terminated, etc. (We have
>> visualizers to help.)
>> 2) Easy to use by default, compatible with the STL containers, simply
>> work.
>> 3) Open to "advanced" customization of 1) and 2) via template
>> arguments if required.
>>
>> Wether these features should be introduced in additional
>> raw_string_ostream functionality or a new string_ostream may be the
>> result of what's technically possible such that it
>>
>> 1) Uses a non-heap container (SmallString) by default.
>> 2) Is able to use other containers (std::string?) for clients who
>> provide it.
>> 3) Easy to use = standard string/ostream semantics where possible.
>> 4) Hard to misuse = hard/impossible to get into
>> invalid/unsynchronized state, does not require repeating code patterns.
>>
>
> Hi Yaron,
>
> Thanks for articulating that. It's a great observation that the best
> approach for string handling will be a synthesis of ADT's common themes.
>
> As far as I can tell it's technically not possible to cover a1,2,3 or
> b1,3,4 by extending raw_string_ostream, while I'm pretty sure we've
> got it all covered with the proposed string_ostream, especially the
> fact that we're now converging raw_svector_ostream and
> raw_string_ostream uses around a single type.
>
> David, ping?
>
> Alp.
>
>
>> Yaron
>>
>>
>>
>> 2014-06-27 4:20 GMT+03:00 Alp Toker <alp at nuanti.com
>> <mailto:alp at nuanti.com>>:
>>
>>
>> On 27/06/2014 02:50, David Blaikie wrote:
>>
>> On Thu, Jun 26, 2014 at 4:05 PM, Alp Toker <alp at nuanti.com
>> <mailto:alp at nuanti.com>> wrote:
>>
>> On 27/06/2014 01:42, David Blaikie wrote:
>>
>> On Thu, Jun 26, 2014 at 3:37 PM, Alp Toker
>> <alp at nuanti.com <mailto:alp at nuanti.com>> wrote:
>>
>> On 27/06/2014 00:48, David Blaikie wrote:
>>
>> On Wed, Jun 25, 2014 at 9:53 PM, David Blaikie
>> <dblaikie at gmail.com <mailto:dblaikie at gmail.com>>
>> wrote:
>>
>> On Wed, Jun 25, 2014 at 5:52 PM, Alp Toker
>> <alp at nuanti.com <mailto:alp at nuanti.com>>
>> wrote:
>>
>> On 26/06/2014 03:16, Chandler Carruth
>> wrote:
>>
>> On Thu, Jun 26, 2014 at 2:09 AM,
>> Alp Toker <alp at nuanti.com
>> <mailto:alp at nuanti.com>
>> <mailto:alp at nuanti.com
>> <mailto:alp at nuanti.com>>> wrote:
>>
>> I've made that change, plus
>> a couple more updated uses, and
>> landed
>> this in r211749.
>>
>> FWIW, perhaps I wasn't clear - but my "I'd
>> like to loop back around
>> with Chandler to nail down the design
>> choice" was "I'm not signing off
>> on this patch without Chandler", so the
>> commit may've been a bit
>> premature.
>>
>> Given this confusion, I think appropriate to
>> back this patch out until
>> it's actually been signed off on.
>>
>> Sorry for my vague language/lack of clarity
>> over incremental review
>> compared to sign-off.
>>
>>
>> I don't agree there was any problem with review
>> here. The patch got its
>> OKs
>> and looks greats,
>>
>> You asked me if it was OK to land. I gave some
>> feedback and said I'd
>> like to wait. You interpreted that as sign-off. I did
>> not intend it as
>> such. We seem to have failed to communicate and I'm
>> trying to clarify
>> what I intended to communicate.
>>
>> I'm sorry for the miscommunication, but I really
>> didn't mean to give
>> you the impression that this was ready to commit.
>> Please revert it.
>>
>>
>> There's obviously no way to revert this without losing the
>> bug fixes.
>>
>> Right - they can all be included together in whatever form
>> this is
>> recommitted. No worries about that. (happens all the time here
>> - given
>> we prefer fine grained commits (ideally so that sometimes they
>> can be
>> reverted separately, or at least examined separately when
>> looking
>> through commit history, etc) so sometimes a set of related
>> patches
>> have to be reverted together. Not a problem)
>>
>> I've backed them out together temporarily in r211814.
>>
>> Thanks!
>>
>> Could you state any concerns you have clearly?
>>
>> Sure - sorry, just didn't want to confuse code review protocol
>> with
>> the specific merits or caveats of this patch. Not even a
>> question of
>> whether I had concerns or not, just that I hadn't actually
>> signed off
>> on the patch. A simple misunderstanding. I'll try to be more
>> clear in
>> the future.
>>
>> A critical feature of the patch is guaranteed safety.
>> Where you see a
>> "string_ostream" type you get an assurance that the string
>> building
>> operations are valid.
>>
>> The difference here would just be "Where you see string_ostream
>> default constructed, you get an assurance that the string
>> building
>> operations are valid", right? (it's just "if you use this
>> constructor
>> that takes an externally-provided buffer, buyer beware" - tihs
>> is true
>> of many types (eg: std::unique_ptr's raw ctor, compared to
>> something
>> like make_unique)).
>>
>>
>> What you're suggesting would be equivalent to having
>> std::unique_ptr default-construct the pointee type which I'm
>> pretty sure it doesn't do?
>>
>> By having a separate type, we get to document the semantics
>> clearly, and search by type. We also get to hide the flush()
>> function directly via the type system.
>>
>> In your scheme that would end up tucked away in the documentation:
>>
>> // If you constructed with X, then behaviour is Y, otherwise
>> behaviour is Z.
>>
>> And even after documenting, I'm sure people would still feel
>> compelled to flush() before str() occasionally. With the proposed
>> patch the compiler issues a helpful diagnostic instead.
>>
>> (Note that std::stringstream *does* take a dual approach, and
>> suffers from consistency as a result -- 66 open question on '
>> stringstream' on StackOverflow at time of writing is high for a
>> single utility. This isn't really part of the standard C++ library
>> we want to emulate in LLVM -- we're much more conscientious about
>> our strings in general, and shouldn't drop the ball here.)
>>
>>
>> Those assurances apply even when the builder is passed by
>> reference. You can
>> see the clarity this adds for example in
>> LLVMCreateMessage() and a few other
>> places on ToT.
>>
>> The cleanup in LLVMCreateMessage and callers seems almost
>> entirely
>> orthogonal to your new type. The same API change could be made
>> even
>> with the old code:
>>
>> string s;
>> raw_string_ostream stream(s);
>> ...
>> return LLVMCreateMessage(stream);
>>
>>
>> Actually I don't think the form you suggest above particularly
>> safe. I'm glad you raised this as I chose the example intentionally.
>>
>> It's unconventional for a function to modify local variables in
>> parent stack frames that weren't explicitly named in the argument
>> list whichever way you bake it.
>>
>> No other code really does this in the LLVM source tree, and
>> although you and I immediately grok what's going on as C++
>> frontend developers, it's actually subtle and easy to miss. The
>> programming errors on ToT show just how easy it is to lose the
>> plot with that pattern.
>>
>> So my change here wasn't orthogonal -- in fact it's completely
>> aligned with and dependent upon the new safety guarantees
>> introduced in the patch. We absolutely need representation in the
>> type system so that we can enforce correctness.
>>
>>
>> LLVMCreateMessage is still safe, clear, and correct using
>> either API
>> because str(), even on raw_string_ostream as it is today, is
>> still
>> well defined to flush and then give access to the string.
>>
>> Or, with a single type, it'd just be:
>>
>> raw_string_ostream stream;
>>
>> instead of
>>
>> string_ostream stream;
>>
>> Or is there some other clarity benefit you were referring to?
>>
>> (with the same LLVMCreateMessage change you made (though I
>> personally
>> would have LLVMCreateMessage take a const std::string& - it
>> seems more
>> appropriate for a variety of callers, even if the current
>> callers all
>> happen to have raw_string_ostreams at the moment - indeed,
>> perhaps
>> some of them should just be using string concatenation anyway))
>>
>> Importantly, there is no locality of construction so *the
>> type becomes the
>> documentation*.
>>
>> Generally it's only important to know this distinction in the
>> scope
>> where the type is declared (because it's that scope where the
>> external
>> storage is provided) - once the stream is passed to some other
>> API,
>> the only access to the string is likely going to be through
>> raw_string_ostream's "str()" member, which does the right thing.
>>
>>
>> See above.
>>
>>
>>
>> If we were to lump storage into raw_string_ostream, this
>> strong safety
>> indicated by the type system would disappear along with
>> all the heap
>> savings.
>>
>> Heap savings can be reacquired where /really/ necessary by using
>> external storage, if necessary. By building in storage, that
>> would
>> become the exception rather than the rule, and something we
>> could more
>> closely scrutinize during code review.
>>
>> (which simply aren't negligible compared to some of the
>> effort going into
>> bit packing elsewhere in the codebase).
>>
>> Apples and oranges, I suspect (& wouldn't mind some evidence
>> to the
>> contrary before we proliferate more choices in data structure
>> type on
>> the codebase - we have a lot, many for good reason, but there
>> is a
>> cognitive/code review/etc load we pay by adding more). The bit
>> packing
>> sort of stuff happens in data structures that there are often
>> thousands upon thousands of in a single run, or at a single
>> point in
>> time so they have a significant impact on peak memory usage. I
>> doubt
>> we have more than a handful of raw_string_ostreams in existence
>> simultaneously.
>>
>>
>> I'd like numbers too, but from visual inspection we can already
>> see that this introduces some worthwhile savings.
>>
>> string_ostream basically just encourages and simplifies our best
>> practice of using SmallString<> on the stack, and that's never
>> been in doubt.
>>
>> In the patch you can see we've fixed truncation issues in code
>> that was previously storing in an external SmallString<>, not just
>> std::string. Chandler's proposed raw_string_ostream wouldn't be a
>> suitable replacement for those uses, missing about 50% of the
>> potential fixes where we want to use this, particularly in clang
>> and LLVM's IR facility. In my patch, we cover the lot with a
>> single type.
>>
>> I saw you discussing how to avoid heap-allocation of a *singleton*
>> recently on this list so I know you care at least a little about
>> this matter :-)
>>
>>
>> If we mix semantics in the base, we'd basically just end
>> up a raw string
>> ostream that sometimes isn't raw, even making the name of
>> the type a
>> little dubious.
>>
>> (On naming: as Chandler mentioned, these are all raw streams -
>> even
>> the new ones you're proposing, the "raw"ness is intended to
>> distinguish them from the C++ Standard stream libraries that
>> have
>> streambufs and all that other stuff, not about whether they have
>> internal storage or not)
>>
>>
>> Bike shed. I'm happy with what's in the patch, but still open to
>> better names.
>>
>> Does this go some way to covering your remaining concerns?
Ping
>>
>> Alp.
>>
>>
>>
>> - David
>>
>>
>> -- http://www.nuanti.com
>> the browser experts
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>>
>
--
http://www.nuanti.com
the browser experts
More information about the llvm-commits
mailing list