[llvm-commits] Twine/StringRef enhancements & usage

David Blaikie dblaikie at gmail.com
Wed Jan 25 11:57:38 PST 2012


Bumping just to get this update off my todo list.

Applying this approach (passing Twine by default, using "TwineString"
whenever a StringRef is required internally (to simplify the
stringification of the twine) & avoiding it when we already have a
std::string that needs to be populated, etc) has some issues:

1) inherently this would increase stack usage across LLVM - having to
allocate the SmallString inside TwineString for every passed string.
This might be a non-trivial tradeoff if we applied this universally
(could be a counterargument for the "we should have a uniform way to
pass strings rather than having callees have to make an informed
decision about whether their callers are likely to pass concatenated
expressions" - which is a pity. It'd be nice if we had a reliable,
consistent approach)

2) currently (see the small clang patch for an example) since Twine is
implicitly constructed from a StringRef reference and SmallString has
an implicit conversion to StringRef - you can't just pass a
SmallString where a Twine is required (it would require two
user-defined conversions). We could flesh out Twine a bit to account
for this (& the various other things StringRef can implicitly convert
from)

3) I've added an extra constructor template to StringRef for arrays -
so that we don't have to use strlen on all those string literals that
get converted to StringRef. It has an assert (to catch the case where
you pass a big buffer array in, not a simple string literal) which is
no more costly than the original constructor such code would've called
in the worst case of an asserts build anyway.

Anyway - just some ideas.

On Fri, Sep 2, 2011 at 12:38 AM, David Blaikie <dblaikie at gmail.com> wrote:
> [+Chris, since he seemed to take an interest in the design discussion
> surrounding this originally]
>
> Bump/ping/etc.
>
> This has been around for a few weeks now. Is this the sort of stuff
> that I'd be better off getting getting commit access for & just having
> it post-commit reviewed?
>
> - David
>
> On Wed, Aug 24, 2011 at 9:40 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>
>>
>> On Wed, Aug 24, 2011 at 3:38 PM, Jordy Rose <jediknil at belkadan.com> wrote:
>>>
>>> A couple comments, though this certainly isn't an area of the code I'm
>>> that familiar with.
>>
>> Thanks for looking
>>
>>>
>>> - Instead of appendTo(string&) and assignTo(string&), why not just add
>>> operator+=(string&, const Twine&) and operator=(string&, const Twine&) ?
>>> Seems more C++ to me.
>>
>> You're right about appendTo, op+=(std::string&, const Twine&) could just be
>> a friend function of Twine. I'll make that change.
>> Unfortunately op=(std::string&, const Twine&) can't be done because op= must
>> be a non-static member function (of std::string). This is why I hadn't done
>> appendTo as op+= too - I'd assumed it had the same restriction, though that
>> doesn't appear to be the case.
>>
>>>
>>> - Re: toNullTerminatedStringRef: A StringRef created from a
>>> null-terminated C string drops the null terminator, so you can't just "test
>>> the last character" to see if it's a null. In fact, having the last
>>> character of a StringRef be null is probably a bug. (Of course, you can't
>>> test /past/ the last character either, because one byte past valid memory is
>>> guaranteed to be a valid address but not guaranteed to be dereferenceable.)
>>
>> Agreed - pity, though. [I wonder if we could squeeze in a bit (the high bit
>> of the length?) somewhere to store "is this null terminated" - it seems a
>> pity to lose that so often/so easily when going into the StringRef domain]
>>
>>>
>>> - You've got several copies of SafeBool.h in the file. I'm guessing this
>>> is the result of reverting and then reapplying patches. (I do this all the
>>> time too.)
>>
>> Hrm, thanks for that. I'll make a fresh diff. I've done that & manually
>> inspected the diff file & I only see SafeBool.h listed once now.
>>
>>>
>>> - TwineString definitely seems evil, but I haven't really thought about it
>>> hard enough to give a good reason why.
>>
>> Oh, it is rather evil, just a moderately quick & dirty, but not utterly
>> broken, solution. The most concrete reason I can come up with is that it
>> muddies StringRef's semantics, mostly - TwineString is a StringRef, but it
>> doesn't at all have the semantics of a StringRef, in fact it has the
>> semantics of a string (mostly... some of the time... if it's not just
>> actually a StringRef)
>> To quote Chris from a previous email where this was discussed:
>>
>> "While it is kinda gross, a subclass of StringRef is probably the lowest
>> friction path to do this." -
>> http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-July/041804.html
>> Not to say that I'm not open to other ideas...
>>
>>>
>>> I don't get why you're using a SmallVectorImpl instead of a SmallVector or
>>> SmallString, though.
>>
>> I think I just used SmallVectorImpl because it's the type that toStringRef
>> required - but SmallString would make more sense if it's got no additional
>> overhead/quirks (or SmallVector, presumably it really doesn't have extra
>> overhead).
>> Settled on SmallString.
>> - David
>>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: stringref_twine.diff
Type: application/octet-stream
Size: 389 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120125/46427ce8/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: stringref_twine_clang.diff
Type: application/octet-stream
Size: 667 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120125/46427ce8/attachment-0001.obj>


More information about the llvm-commits mailing list