[LLVMdev] Correct use of StringRef and Twine

Tue Jul 19 10:07:03 PDT 2011

On Jul 18, 2011, at 12:38 PM, David Blaikie wrote:

> I'm attempting to do some amount of mass migration from const std::string& (& const char*) to StringRef.

Great!

> In doing so I stumbled across the fact that while StringRef has no op+ to speak of (well, it has += and I added a manual StringRef+std::string and std::string+StringRef for now - though I'm thinking perhaps they can be skipped in favor of Twine operations), Twine does provide a variety of handy op+ implementations that get picked up mostly for free.

Yeah, I don't think we want to support + on StringRef without going through Twine.

> A few questions from this:
> 
> * When should a StringRef parameter type be preferred over a const Twine& parameter type?

Generally, here are the guidelines for return values:
1. Use std::string when the value is computed, or there are other lifetime issues.
2. Use StringRef otherwise.

And for arguments, generally always use Twine as the default, it allows construction of complex things, and is still efficient when passed the equiv of a StringRef (with the toStringRef method).  The only annoying thing about it is that the API to do this requires a temporary SmallVector to scribble in, which makes it more difficult to use.

It seems that there should be a good way to fix this API problem.

> * Would it be OK to implement an implicit conversion from Twine to std::string rather than using the explicit str() function? (when switching to StringRef many expressions became Twine concatenation when they were std::string concatenation - this isn't a drop-in replacement due to the absence of std::string conversion from Twine (which should be a perf win, I'd think - delaying concatenation into a single operation instead of (((a + b) + c) + d)), so I've had to wrap various concatenation expressions in (...).str())

I'd prefer not.  I'd rather convert the things that want an std::string to take a Twine.

> * What would happen if Twine gave back a pointer/reference to some internal storage? In the common/correct use case (taking a Twine as a const ref, or using a Twine entirely in a temporary concatenation expression) wouldn't the Twine's internal storage live long enough to allow the caller to use that buffer within the life of the statement (as in, say, o << (aStringRef + "foo"))? 

This is really dangerous.  I'd much rather extend raw_ostream to take twines.

> This last one is probably the bigger question & relates to the other two. It seems like Twine is nice for cases where deferred concatenation is required & might be able to be avoided altogether - if the majority case is that the concatenation ends up being thrown away, you win. But as you approach the 100% usage (not throwing away ever) are you losing perf in the simple non-concatenative case (all those callers who just pass in a straight string without any concatenation - the Twine then returns that string by value when str() is called, right?)?

The major win is actually when you have clients that don't *want* an std::string.  std::string is really slow in most operations (it does atomic operations and COW with common implementations, not to mention heap allocations, though libc++ is much much better about both of these).  Twine is optimized to avoid creating the temporary std::string at all.

> At the moment, if the lifetime works out & Twine can return a reference to a member (this would mean unrealized Twines would have a bunch of empty/spare std::strings lying around, but I expect those could be optimized away fairly reliably), that Twine would be strictly superior to StringRef & could be preferred in all cases, potentially even replacing/simplifying StringRef substantially.

The toStringRef(x) method is what you want.  It is really fast and does no copy if the twine *just* contains a C string, std::string or StringRef, and in the concat cases it does no memory allocation in the common case where the SmallVector is big enough for the result.

-Chris