[LLVMdev] Correct use of StringRef and Twine

Fri Jul 22 17:29:08 PDT 2011

On Jul 22, 2011, at 2:59 PM, David Blaikie wrote:

>> The dangerous part of this is that characters are integers, so "foo" + 'x' is very likely to cause serious problems.
> 
> std::string already provides such overloads though, doesn't it? So the
> code isn't any safer from accidental "foo" + 'x' expressions that
> don't include Twine/StringRef/std::string than it was before. But if
> the argument is that std::string's interface was poorly
> designed/unsafe & we can do better/safer, I'm OK with making the ctor
> explicit as you've suggested.

Yes, exactly.  I'm just saying that I think the additional clarity of:
  "foo" + Twine('x') 

is worth the inconvenience.

>>  You should also probably add a ctor for signed/unsigned char as well (which reuse the existing CharKind enum).
> 
> Hmm - would it be safe to cast those signed/unsigned chars to straight
> char? (is it guaranteed that the signed & unsigned values with the
> same representation map to the same glyph?)

Yes.  I consider 'signed vs unsigned char vs char' to be a blight on the C type system.  Just casting to char internally would be fine.

> As a side note on Twine's design: Is there a particular reason it uses
> void*s rather than unions? 

I'm not sure what you're proposing specifically.

> and chars rather than enums?

char vs enum is because of visual studio compatibility and because enums often are stored as 32-bit values instead of 8-bit values.

> (sorry if I'm asking lots of "why is this like this" questions all
> over the code base - I just don't want to assume that it's intentional
> and replicate a pattern elsewhere that I don't understand only to find
> it's unintentional "not fixed yet" sort of stuff. I suppose at the
> very least it'll be a chance to add in some explanatory comments if I
> do find things that are by design but weren't clear to me)

No problem at all, happy to help answer the questions.  Forcing a reexamination of past decisions is not a bad thing at all in this case :)

>> Right.  Something like this could work:
>> 
>> foo(const Twine &T) {
>> ...
>>  TwineTmpBuffer Tmp;
>>  StringSwitch(T.toString(Tmp)).....
>> 
>> Which doesn't seem too horrible, just needs a typedef of smallvector to TwineTmpBuffer.
> 
> In a few choice places, maybe, but as the default way to pass string
> parameters I think that'd be a hard sell as a general practice.

Yes, from a general API design perspective, I hate having to force a choice between "convenience to implementor of an API to just take a StringRef" vs "convenience to client of API for it to take a Twine".  It really stinks.

I was chatting with Howard Hinnant about this and he suggested replacing Twine with a template metaprogramming expression-template system.  I haven't thought through all the details, but perhaps this would allow us to get the best of both worlds?

> Hmm, this is one bit I'm not sure about. As I tried to explain above,
> it seems problematic to have to choose your argument type on the basis
> of how you think callers might use your API. From the perspective of a
> caller, a Twine argument is at least as expressive as a StringRef
> (since all StringRefs can be Twined implicitly), but it takes that
> extra step to write the implementation.
> 
> Perhaps I'm aiming for some kind of purist/perfectionist argument that
> isn't necessary or practical, but I hope I've been clear in explaining
> my uncertainty/issue here.

Yes, I'm deeply unhappy about this aspect of our string api's.

-Chris