[LLVMdev] Correct use of StringRef and Twine

Sun Jul 24 13:58:35 PDT 2011

On Jul 24, 2011, at 12:09 AM, David Blaikie wrote:
>> Yes, exactly.  I'm just saying that I think the additional clarity of:
>>  "foo" + Twine('x')
>> 
>> is worth the inconvenience.
> 
> Ok, attached a modified version of my patch with an Twine(char),
> Twine(unsigned char), and Twine(signed char). All three are explicit &
> have included test cases.

Ok, great.

> Speaking of which - is there any LLVM dev policy or preference around
> C-style casts versus C++-style casts? Twine is littered with the

Nope, use the one that is the most clear in a given case.  Except in extreme cases, the C++ style casts seem overly verbose and don't add a lot of value.

>>> As a side note on Twine's design: Is there a particular reason it uses
>>> void*s rather than unions?
>> 
>> I'm not sure what you're proposing specifically.
> 
> I've attached an version of my fix that includes the union use I was
> referring to (sorry for being vague in the previous email). Though if
> we're considering a more explicitly structured solution with
> templates/subtypes/etc we could skip the whole mess of enums &
> unions/casts.

aha, that *is* much nicer!  Applied with a few tweaks (remove some extraneous 'explicit' keywords and wrapped to 80 cols).  Thanks!

>>> and chars rather than enums?
>> 
>> char vs enum is because of visual studio compatibility and because enums often are stored as 32-bit values instead of 8-bit values.
> 
> Curious - seems VS >2k3 (
> http://msdn.microsoft.com/en-us/library/2dzy4k6e(v=VS.80).aspx ) has
> supported specifying enum backing type (so you can specify it as char:
> "enum foo : char { ... }") if that was the feature you were wanting to
> use but avoiding.

Right, but that requires VS #ifdefs.  You can also use enum bitfields, but VS has different promotion rules from them than other compilers.

>> Yes, from a general API design perspective, I hate having to force a choice between "convenience to implementor of an API to just take a StringRef" vs "convenience to client of API for it to take a Twine".  It really stinks.
> 
> Ok - glad we're on the same page there, wasn't quite sure whether we
> were talking past one another in previous emails.
> 
> I have a couple of ideas about how to lower the developer cost of
> using Twine ubiquitously. I'll try to describe them in some detail.

Ok!

>> I was chatting with Howard Hinnant about this and he suggested replacing Twine with a template metaprogramming expression-template system.  I haven't thought through all the details, but perhaps this would allow us to get the best of both worlds?
> 
> I think I'd prefer a template metaprogramming solution to Twine just
> from a purist perspective compared to all the
> casting/reinterpreting/enums (even with the union) - that sort of
> stuff makes me a bit uncomfortable, but I realize it's well contained
> enough that it's not too risky (& I know some of the other tricks LLVM
> uses like low bit twiddling in pointers, etc, so this seems relatively
> sane in the grand scheme of things).

Right.  As long as the damage is well contained and isolated in the class, I don't care much about stuff like this.  One of the great features of C++ is that it lets you do whatever insane thing you want and draw a nice interface around it :)

> 1) the easy solution: create a StringRef subclass (or new type with a
> similarly expressive API) that wraps a buffer that it may or may not
> use. Give it a ctor that takes a Twine. The use case then goes from
> this:
> 
>    SmallVectorImpl<char> v;
>    StringRef s = theTwine.GetStringRef(v);
> 
> to this:
> 
>    TwineString s(theTwine);
> 
> (& the ctor looks like: TwineString(const Twine& t) { *this =
> t.GetStringRef(this->buffer); })
> So any Twine sink now has to pay one line of code to get a usable
> string out of a Twine. Cheap to implement, easy to fix existing use
> cases (privatize GetStringRef(SmallVectorImpl<char>) and make
> TwineString a friend - fix everything that fails to compile).
> 
> This keeps things simple & seems to be "good enough" to me, but we
> could perhaps do better (at the very least, again, if we did do
> better, we could go back & remove TwineString & again fix all the
> places that fail to compile with whatever new hotness we invent)

This definitely seems like a step forward.  While it is kinda gross, a subclass of StringRef is probably the lowest friction path to do this.

> Hmm - new thought: what would happen if my argument type was simply
> TwineString? If I knew my function needed access to the values (like
> the Host.cpp example) but didn't have any buffer of its own that it
> needed to use, that seems OK. This new thought perhaps renders
> suggestion (2) uninteresting. Functions that know they always need the
> string contents take a TwineString. Functions that might be able to
> skip realizing the string entirely - or have their own buffer (eg: a
> member std::string) to populate - take a Twine & explicitly construct
> a TwineString only in the path that needs the value. That seems
> optimal to me.

I'm dubious of this.  This is exposing an implementation detail to clients that they shouldn't have to know, and if passed by value, this could be expensive.

> 1.1) We probably want a better way to get a Twine into a std::string -
> while the change to Triple's ctor works because returning a
> std::string from a function into a ctor can do fancy in-place
> construction - in any other case like "SetFoo(const Twine& t) {
> this->s = t; }" we'd want something better. Perhaps simply
> t.AssignTo(s);

Adding a Twine::assignTo method would make sense to me!

> 2) This one isn't quite as fleshed out, but see if the general outline
> makes sense: Introduce an extra type that's Twine-esque, but is only
> the top level (so it's implicitly constructible from a const Twine&),
> rather than all the branches of a Twine. If a function wants to accept
> a Twine that it needs to manipulate, it can mark its argument as
> TwineString2 (for want of a better name, just using 2 to distinguish
> from the (1) example). TwineString2 has a buffer in it, but it starts
> uninitialized. Any attempt to retrieve/inspect values causes the
> string to be initialized. [downside: all string operations on a
> TwineString2 incur an "if not initialized" check]
> 
> Twines could be constructed from TwineString2s if the value needed to
> be passed on to another function. (& still take advantage of any
> initialized string buffer it might have, rather than having to
> remanifest from its child nodes repeatedly)
> 
> One fancy enhancement to this is that TwineString2 could partially
> manifest - eg: if you want ts[2] it could walk its children (dropping
> its references to children & updating the buffer with the contents at
> each step) until it found the 2nd character - then stop, with a
> partially expanded tree/buffer. Though there's probably nothing
> stopping us implementing such smarts in the (1) solution eventually
> either.

I think you're flying way over my head here :)

-Chris