[LLVMdev] Correct use of StringRef and Twine

Fri Jul 22 13:42:54 PDT 2011

On Jul 21, 2011, at 12:00 AM, David Blaikie wrote:
>> And for arguments, generally always use Twine as the default, it allows construction of complex things, and is still efficient when passed the equiv of a StringRef (with the toStringRef method).  The only annoying thing about it is that the API to do this requires a temporary SmallVector to scribble in, which makes it more difficult to use.
>> 
>> It seems that there should be a good way to fix this API problem.
> 
> This is the problem I'm still trying to figure out. It seems that
> while Twine is efficient to pass, it's not easy to use efficiently & I
> don't think it'd be appropriate (correct me if I'm wrong) to go adding
> in temporary SmallVectors all over the place to try to consume Twine
> parameters in any code that needs to handle string contents after
> migrating the argument types to Twines.

Are you concerned about the stack size of the smallvector's, or the amount of code that needs to be written?

> One place I experimented with fixing tonight (after trying broader
> goals - like changing all StringRef args in clang only, say) was to
> add a Twine(char) ctor to enable llvm::Triple's ctor to take Twines
> rather than StringRefs, and then do Twine op+ to build the Data
> member.

Nice.

> The problem I see with this is that the current implementation of
> Triple's ctor is still more efficient than the simple Twine version:
> 
> : Data(x)
> {
> Data += '-';
> Data += x;
> Data += '-';
> Data += 'z';
> }
> 
> (essentially), as opposed to:
> 
> : Data((x + '-' + y + '-' + z).str())
> 
> Which requires an extra string copy of the final value in all cases...
> actually, now that I think about it, since it's returning into a ctor,
> this might be nicely optimized - so perhaps this is the "right way" to
> write this code. [diff attached]

The former is relying on how append works on std::string.  Beyond the number of bytes copied, the former could cause (depending on the implementation of std::string) a reallocation for every append.

Well written, the Twine version could just do a resize/reserve once, then fill in the bytes.  I don't know if Twine does that yet though.

> So then here's another example (can't find the exact piece of code I
> had been working on, but taking
> tools/clang/lib/Basic/Targets.cpp::getCPUDefineSuffix as an example
> anyway - just something using a StringSwitch, really). If this
> function were to take a const Twine& and pass it along to
> StringSwitch, ultimately StringSwitch (being the sink for the Twine -
> the code needing to read the individual character elements) would be
> responsible for allocating a temporary SmallVectorImpl, passing that
> in, getting a StringRef & then using that for its work.

Right.  Something like this could work:

foo(const Twine &T) {
...  
  TwineTmpBuffer Tmp;
  StringSwitch(T.toString(Tmp))..... 

Which doesn't seem too horrible, just needs a typedef of smallvector to TwineTmpBuffer.

> Another example of a string sink is, say, tools/llvmc/src/Hooks.cpp. I
> found this while looking for uses of "+= '-'" to use the char support
> for Twine I'd just added. But I can't upgrade the Arg argument from
> const std::string& to const Twine& easily since it needs to actually
> manipulate the string - find, access elements, and substr it.

We should make ".str()" an efficient way to get a mutable std::string out.

> Should this work just be done case by case? If so, I don't think I'll
> end up with much Twine usage, probably just a lot of StringRef usage &
> lots of str() calls.

Yes, I think it should be done carefully, one API at a time.  I think that would make sense.  Sticking with StringRef where it is possible is simple and good.

-Chris