[llvm-commits] Twine/StringRef enhancements & usage

David Blaikie dblaikie at gmail.com
Wed Jan 25 12:05:03 PST 2012


Now with real patches.

On Wed, Jan 25, 2012 at 11:57 AM, David Blaikie <dblaikie at gmail.com> wrote:
> Bumping just to get this update off my todo list.
>
> Applying this approach (passing Twine by default, using "TwineString"
> whenever a StringRef is required internally (to simplify the
> stringification of the twine) & avoiding it when we already have a
> std::string that needs to be populated, etc) has some issues:
>
> 1) inherently this would increase stack usage across LLVM - having to
> allocate the SmallString inside TwineString for every passed string.
> This might be a non-trivial tradeoff if we applied this universally
> (could be a counterargument for the "we should have a uniform way to
> pass strings rather than having callees have to make an informed
> decision about whether their callers are likely to pass concatenated
> expressions" - which is a pity. It'd be nice if we had a reliable,
> consistent approach)
>
> 2) currently (see the small clang patch for an example) since Twine is
> implicitly constructed from a StringRef reference and SmallString has
> an implicit conversion to StringRef - you can't just pass a
> SmallString where a Twine is required (it would require two
> user-defined conversions). We could flesh out Twine a bit to account
> for this (& the various other things StringRef can implicitly convert
> from)
>
> 3) I've added an extra constructor template to StringRef for arrays -
> so that we don't have to use strlen on all those string literals that
> get converted to StringRef. It has an assert (to catch the case where
> you pass a big buffer array in, not a simple string literal) which is
> no more costly than the original constructor such code would've called
> in the worst case of an asserts build anyway.
>
> Anyway - just some ideas.
>
> On Fri, Sep 2, 2011 at 12:38 AM, David Blaikie <dblaikie at gmail.com> wrote:
>> [+Chris, since he seemed to take an interest in the design discussion
>> surrounding this originally]
>>
>> Bump/ping/etc.
>>
>> This has been around for a few weeks now. Is this the sort of stuff
>> that I'd be better off getting getting commit access for & just having
>> it post-commit reviewed?
>>
>> - David
>>
>> On Wed, Aug 24, 2011 at 9:40 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>>
>>>
>>> On Wed, Aug 24, 2011 at 3:38 PM, Jordy Rose <jediknil at belkadan.com> wrote:
>>>>
>>>> A couple comments, though this certainly isn't an area of the code I'm
>>>> that familiar with.
>>>
>>> Thanks for looking
>>>
>>>>
>>>> - Instead of appendTo(string&) and assignTo(string&), why not just add
>>>> operator+=(string&, const Twine&) and operator=(string&, const Twine&) ?
>>>> Seems more C++ to me.
>>>
>>> You're right about appendTo, op+=(std::string&, const Twine&) could just be
>>> a friend function of Twine. I'll make that change.
>>> Unfortunately op=(std::string&, const Twine&) can't be done because op= must
>>> be a non-static member function (of std::string). This is why I hadn't done
>>> appendTo as op+= too - I'd assumed it had the same restriction, though that
>>> doesn't appear to be the case.
>>>
>>>>
>>>> - Re: toNullTerminatedStringRef: A StringRef created from a
>>>> null-terminated C string drops the null terminator, so you can't just "test
>>>> the last character" to see if it's a null. In fact, having the last
>>>> character of a StringRef be null is probably a bug. (Of course, you can't
>>>> test /past/ the last character either, because one byte past valid memory is
>>>> guaranteed to be a valid address but not guaranteed to be dereferenceable.)
>>>
>>> Agreed - pity, though. [I wonder if we could squeeze in a bit (the high bit
>>> of the length?) somewhere to store "is this null terminated" - it seems a
>>> pity to lose that so often/so easily when going into the StringRef domain]
>>>
>>>>
>>>> - You've got several copies of SafeBool.h in the file. I'm guessing this
>>>> is the result of reverting and then reapplying patches. (I do this all the
>>>> time too.)
>>>
>>> Hrm, thanks for that. I'll make a fresh diff. I've done that & manually
>>> inspected the diff file & I only see SafeBool.h listed once now.
>>>
>>>>
>>>> - TwineString definitely seems evil, but I haven't really thought about it
>>>> hard enough to give a good reason why.
>>>
>>> Oh, it is rather evil, just a moderately quick & dirty, but not utterly
>>> broken, solution. The most concrete reason I can come up with is that it
>>> muddies StringRef's semantics, mostly - TwineString is a StringRef, but it
>>> doesn't at all have the semantics of a StringRef, in fact it has the
>>> semantics of a string (mostly... some of the time... if it's not just
>>> actually a StringRef)
>>> To quote Chris from a previous email where this was discussed:
>>>
>>> "While it is kinda gross, a subclass of StringRef is probably the lowest
>>> friction path to do this." -
>>> http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-July/041804.html
>>> Not to say that I'm not open to other ideas...
>>>
>>>>
>>>> I don't get why you're using a SmallVectorImpl instead of a SmallVector or
>>>> SmallString, though.
>>>
>>> I think I just used SmallVectorImpl because it's the type that toStringRef
>>> required - but SmallString would make more sense if it's got no additional
>>> overhead/quirks (or SmallVector, presumably it really doesn't have extra
>>> overhead).
>>> Settled on SmallString.
>>> - David
>>>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: stringref_twine.diff
Type: application/octet-stream
Size: 34501 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120125/d47f51ea/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: stringref_twine_clang.diff
Type: application/octet-stream
Size: 660 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120125/d47f51ea/attachment-0001.obj>


More information about the llvm-commits mailing list