[LLVMdev] Correct use of StringRef and Twine

Sun Jul 24 00:09:35 PDT 2011

> Yes, exactly.  I'm just saying that I think the additional clarity of:
>  "foo" + Twine('x')
>
> is worth the inconvenience.

Ok, attached a modified version of my patch with an Twine(char),
Twine(unsigned char), and Twine(signed char). All three are explicit &
have included test cases.

Speaking of which - is there any LLVM dev policy or preference around
C-style casts versus C++-style casts? Twine is littered with the
former but I generally use the latter for safety/clarity of intend,
etc. (I'm not going to try to fix everything in Twine just now if
we're considering alternative implementations like template
metaprogramming anyway)

>> As a side note on Twine's design: Is there a particular reason it uses
>> void*s rather than unions?
>
> I'm not sure what you're proposing specifically.

I've attached an version of my fix that includes the union use I was
referring to (sorry for being vague in the previous email). Though if
we're considering a more explicitly structured solution with
templates/subtypes/etc we could skip the whole mess of enums &
unions/casts.

>> and chars rather than enums?
>
> char vs enum is because of visual studio compatibility and because enums often are stored as 32-bit values instead of 8-bit values.

Curious - seems VS >2k3 (
http://msdn.microsoft.com/en-us/library/2dzy4k6e(v=VS.80).aspx ) has
supported specifying enum backing type (so you can specify it as char:
"enum foo : char { ... }") if that was the feature you were wanting to
use but avoiding.

> Yes, from a general API design perspective, I hate having to force a choice between "convenience to implementor of an API to just take a StringRef" vs "convenience to client of API for it to take a Twine".  It really stinks.

Ok - glad we're on the same page there, wasn't quite sure whether we
were talking past one another in previous emails.

I have a couple of ideas about how to lower the developer cost of
using Twine ubiquitously. I'll try to describe them in some detail.

> I was chatting with Howard Hinnant about this and he suggested replacing Twine with a template metaprogramming expression-template system.  I haven't thought through all the details, but perhaps this would allow us to get the best of both worlds?

I think I'd prefer a template metaprogramming solution to Twine just
from a purist perspective compared to all the
casting/reinterpreting/enums (even with the union) - that sort of
stuff makes me a bit uncomfortable, but I realize it's well contained
enough that it's not too risky (& I know some of the other tricks LLVM
uses like low bit twiddling in pointers, etc, so this seems relatively
sane in the grand scheme of things). But I'm not sure it'll
fundamentally change how we might be able to solve this caller
convenience issue.

So here are my ideas:

1) the easy solution: create a StringRef subclass (or new type with a
similarly expressive API) that wraps a buffer that it may or may not
use. Give it a ctor that takes a Twine. The use case then goes from
this:

    SmallVectorImpl<char> v;
    StringRef s = theTwine.GetStringRef(v);

to this:

    TwineString s(theTwine);

(& the ctor looks like: TwineString(const Twine& t) { *this =
t.GetStringRef(this->buffer); })
So any Twine sink now has to pay one line of code to get a usable
string out of a Twine. Cheap to implement, easy to fix existing use
cases (privatize GetStringRef(SmallVectorImpl<char>) and make
TwineString a friend - fix everything that fails to compile).

This keeps things simple & seems to be "good enough" to me, but we
could perhaps do better (at the very least, again, if we did do
better, we could go back & remove TwineString & again fix all the
places that fail to compile with whatever new hotness we invent)

Hmm - new thought: what would happen if my argument type was simply
TwineString? If I knew my function needed access to the values (like
the Host.cpp example) but didn't have any buffer of its own that it
needed to use, that seems OK. This new thought perhaps renders
suggestion (2) uninteresting. Functions that know they always need the
string contents take a TwineString. Functions that might be able to
skip realizing the string entirely - or have their own buffer (eg: a
member std::string) to populate - take a Twine & explicitly construct
a TwineString only in the path that needs the value. That seems
optimal to me.

1.1) We probably want a better way to get a Twine into a std::string -
while the change to Triple's ctor works because returning a
std::string from a function into a ctor can do fancy in-place
construction - in any other case like "SetFoo(const Twine& t) {
this->s = t; }" we'd want something better. Perhaps simply
t.AssignTo(s);

2) This one isn't quite as fleshed out, but see if the general outline
makes sense: Introduce an extra type that's Twine-esque, but is only
the top level (so it's implicitly constructible from a const Twine&),
rather than all the branches of a Twine. If a function wants to accept
a Twine that it needs to manipulate, it can mark its argument as
TwineString2 (for want of a better name, just using 2 to distinguish
from the (1) example). TwineString2 has a buffer in it, but it starts
uninitialized. Any attempt to retrieve/inspect values causes the
string to be initialized. [downside: all string operations on a
TwineString2 incur an "if not initialized" check]

Twines could be constructed from TwineString2s if the value needed to
be passed on to another function. (& still take advantage of any
initialized string buffer it might have, rather than having to
remanifest from its child nodes repeatedly)

One fancy enhancement to this is that TwineString2 could partially
manifest - eg: if you want ts[2] it could walk its children (dropping
its references to children & updating the buffer with the contents at
each step) until it found the 2nd character - then stop, with a
partially expanded tree/buffer. Though there's probably nothing
stopping us implementing such smarts in the (1) solution eventually
either.

Thanks,
- David
-------------- next part --------------
A non-text attachment was scrubbed...
Name: twine_explicit_char.patch
Type: application/octet-stream
Size: 4365 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110724/852ca59f/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: twine_explicit_char_union.patch
Type: application/octet-stream
Size: 15243 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110724/852ca59f/attachment-0001.obj>