[LLVMdev] Correct use of StringRef and Twine

David Blaikie dblaikie at gmail.com
Sun Jul 24 18:31:12 PDT 2011


> Right, but that requires VS #ifdefs.  You can also use enum bitfields, but VS has different promotion rules from them than other compilers.

Ah, I'd thought that feature (specifying the backing type with "enum
name : integral_type") was standard, but I see it's a C++0x thing. My
mistake.

>> 1) the easy solution: create a StringRef subclass (or new type with a
>> similarly expressive API) that wraps a buffer that it may or may not
>> use. Give it a ctor that takes a Twine. The use case then goes from
>> this:
>>
>>    SmallVectorImpl<char> v;
>>    StringRef s = theTwine.GetStringRef(v);
>>
>> to this:
>>
>>    TwineString s(theTwine);
>>
>> (& the ctor looks like: TwineString(const Twine& t) { *this =
>> t.GetStringRef(this->buffer); })
>> So any Twine sink now has to pay one line of code to get a usable
>> string out of a Twine. Cheap to implement, easy to fix existing use
>> cases (privatize GetStringRef(SmallVectorImpl<char>) and make
>> TwineString a friend - fix everything that fails to compile).
>>
>> This keeps things simple & seems to be "good enough" to me, but we
>> could perhaps do better (at the very least, again, if we did do
>> better, we could go back & remove TwineString & again fix all the
>> places that fail to compile with whatever new hotness we invent)
>
> This definitely seems like a step forward.  While it is kinda gross, a subclass of StringRef is probably the lowest friction path to do this.

Come to think of it - is there any use for StringRef other than as
function arguments? If we get Twine to be 'better' for function
arguments than StringRef, then StringRef goes away - or put another
way: StringRef could be converted into TwineString (with or without a
rename) rather than deriving/duplicating the functionality into two
types.

It would be a matter of moving all of StringRef's ctors over to Twine,
though. At which point I'd be kind of inclined to start looking at a
templated solution to Twine since it'll have to start supporting
several more types (well, raw character arrays at least). I can do
that progressively - remove a StringRef ctor, add a Twine ctor, fix
all the places that took StringRefs that break because there's no
conversion anymore.

I'm going to have a play with this & see how it looks.

>> Hmm - new thought: what would happen if my argument type was simply
>> TwineString? If I knew my function needed access to the values (like
>> the Host.cpp example) but didn't have any buffer of its own that it
>> needed to use, that seems OK.
>
> I'm dubious of this.  This is exposing an implementation detail to clients that they shouldn't have to know, and if passed by value, this could be expensive.

Yeah, I'm on the fence about this. If we're OK with an extra line
(declaring the TwineString) for every string (Twine) argument, that
seems livable.

>> 2) This one isn't quite as fleshed out, but see if the general outline
>> makes sense: Introduce an extra type that's Twine-esque, but is only
>> the top level (so it's implicitly constructible from a const Twine&),
>> rather than all the branches of a Twine. If a function wants to accept
>> a Twine that it needs to manipulate, it can mark its argument as
>> TwineString2 (for want of a better name, just using 2 to distinguish
>> from the (1) example). TwineString2 has a buffer in it, but it starts
>> uninitialized. Any attempt to retrieve/inspect values causes the
>> string to be initialized. [downside: all string operations on a
>> TwineString2 incur an "if not initialized" check]
>>
>> Twines could be constructed from TwineString2s if the value needed to
>> be passed on to another function. (& still take advantage of any
>> initialized string buffer it might have, rather than having to
>> remanifest from its child nodes repeatedly)
>>
>> One fancy enhancement to this is that TwineString2 could partially
>> manifest - eg: if you want ts[2] it could walk its children (dropping
>> its references to children & updating the buffer with the contents at
>> each step) until it found the 2nd character - then stop, with a
>> partially expanded tree/buffer. Though there's probably nothing
>> stopping us implementing such smarts in the (1) solution eventually
>> either.
>
> I think you're flying way over my head here :)

No worries - I'm not sure it's really too compelling after (1),
really. Thin of (1) with the dangerous case (used as the argument
type), but with lazy evaluation. It has all the operations of
StringRef, like TwineString(1), but doesn't evaluate its
subexpressions (LHS/RHS in Twine) until an attempt is made to access
the string value. The last paragraph was an "ultra lazy" possibility -
where it could manifest part of the string (this would be harder to do
with template metaprogramming, I think - since it'd be more strictly
compile time). If you need the first character - well just manifest
the Twine up to the first non-empty branch, but there's no need to go
& create the whole string in a single buffer just yet.

I think this laziness would just be expensive though - involving lots
of checks to see if the string was manifest any time an access/string
operation was used.

- David




More information about the llvm-dev mailing list