[LLVMdev] Correct use of StringRef and Twine
Chris Lattner
clattner at apple.com
Fri Jul 22 13:42:54 PDT 2011
On Jul 21, 2011, at 12:00 AM, David Blaikie wrote:
>> And for arguments, generally always use Twine as the default, it allows construction of complex things, and is still efficient when passed the equiv of a StringRef (with the toStringRef method). The only annoying thing about it is that the API to do this requires a temporary SmallVector to scribble in, which makes it more difficult to use.
>>
>> It seems that there should be a good way to fix this API problem.
>
> This is the problem I'm still trying to figure out. It seems that
> while Twine is efficient to pass, it's not easy to use efficiently & I
> don't think it'd be appropriate (correct me if I'm wrong) to go adding
> in temporary SmallVectors all over the place to try to consume Twine
> parameters in any code that needs to handle string contents after
> migrating the argument types to Twines.
Are you concerned about the stack size of the smallvector's, or the amount of code that needs to be written?
> One place I experimented with fixing tonight (after trying broader
> goals - like changing all StringRef args in clang only, say) was to
> add a Twine(char) ctor to enable llvm::Triple's ctor to take Twines
> rather than StringRefs, and then do Twine op+ to build the Data
> member.
Nice.
> The problem I see with this is that the current implementation of
> Triple's ctor is still more efficient than the simple Twine version:
>
> : Data(x)
> {
> Data += '-';
> Data += x;
> Data += '-';
> Data += 'z';
> }
>
> (essentially), as opposed to:
>
> : Data((x + '-' + y + '-' + z).str())
>
> Which requires an extra string copy of the final value in all cases...
> actually, now that I think about it, since it's returning into a ctor,
> this might be nicely optimized - so perhaps this is the "right way" to
> write this code. [diff attached]
The former is relying on how append works on std::string. Beyond the number of bytes copied, the former could cause (depending on the implementation of std::string) a reallocation for every append.
Well written, the Twine version could just do a resize/reserve once, then fill in the bytes. I don't know if Twine does that yet though.
> So then here's another example (can't find the exact piece of code I
> had been working on, but taking
> tools/clang/lib/Basic/Targets.cpp::getCPUDefineSuffix as an example
> anyway - just something using a StringSwitch, really). If this
> function were to take a const Twine& and pass it along to
> StringSwitch, ultimately StringSwitch (being the sink for the Twine -
> the code needing to read the individual character elements) would be
> responsible for allocating a temporary SmallVectorImpl, passing that
> in, getting a StringRef & then using that for its work.
Right. Something like this could work:
foo(const Twine &T) {
...
TwineTmpBuffer Tmp;
StringSwitch(T.toString(Tmp)).....
Which doesn't seem too horrible, just needs a typedef of smallvector to TwineTmpBuffer.
> Another example of a string sink is, say, tools/llvmc/src/Hooks.cpp. I
> found this while looking for uses of "+= '-'" to use the char support
> for Twine I'd just added. But I can't upgrade the Arg argument from
> const std::string& to const Twine& easily since it needs to actually
> manipulate the string - find, access elements, and substr it.
We should make ".str()" an efficient way to get a mutable std::string out.
> Should this work just be done case by case? If so, I don't think I'll
> end up with much Twine usage, probably just a lot of StringRef usage &
> lots of str() calls.
Yes, I think it should be done carefully, one API at a time. I think that would make sense. Sticking with StringRef where it is possible is simple and good.
-Chris
More information about the llvm-dev
mailing list