[PATCH] A new ADT: StringRefNulTerminated

Chandler Carruth chandlerc at google.com
Sat Feb 23 16:11:02 PST 2013


Stepping back (alot), I'm really unconvinced about the need for this class.

On Wed, Feb 20, 2013 at 1:40 PM, Dmitri Gribenko <gribozavr at gmail.com>wrote:

> Hello,
>
> I want to propose a new ADT: StringRefNulTerminated.
>
> The ``StringRefNulTerminated`` class
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> ``StringRefNulTerminated`` is a subclass of ``StringRef`` that represents a
> reference to a NUL-terminated string.  It can be safely converted to 'const
> char *' without copying the underlying data.  Because it is a subclass of
> ``StringRef``, slicing implicitly occurs whenever a
> ``StringRefNulTerminated``
> object is passed to an API that requires a ``StringRef``.  Since
> ``StringRefNulTerminated`` does not contain any extra data, slicing
> preserves
> all the information (except for the fact that the string was
> NUL-terminated).
>
> The motivation behind this class is to preserve the information about the
> NUL
> terminator in the string.  With ``StringRef``, this information is lost.
>  Also,
> one can not check if a ``StringRef`` contains a NUL-terminated string,
> thus it
> is required to copy the data to convert from a ``StringRef`` to
> ``const char *``.
>

Everything you describe here is true of a 'const char*'. Why not just pass
that through the API layers where it is important to have a C-string
representation?

pro: It's a lot simpler and more clear about the fact that the code is
dealing with a C-string.
pro: It doesn't add a whole new class to ADT
pro: It doesn't add complexity of understanding how this class and
StringRef interact -- we already undrestand how to build a StringRef out of
a C-string.

con: It requires users who need a StringRef-like API to build a StringRef
around it.

This doesn't seem too bad... maybe there are other cons?


>
> Example:
>
>   // Function returning a NUL-terminated string.
>   StringRefNulTerminated getAString();
>
>   StringRefNulTerminated S1 = getAString();
>   StringRef S2 = getAString(); // Slicing to a StringRef.
>   const char *S3 = getAString().c_str(); // Convert back to a 'const char
> *'
>                                          // without copying data.
>   const char *S4 = S1.c_str(); // Same.
>
> Discussion
> ^^^^^^^^^^
>
> The first user of the "convert to const char *" functionality is
> libclang that exposes StringRefs via the C API.  It happens that quite
> a few C++ APIs in Clang return NUL-terminated strings as StringRefs.
> But after a NUL-terminated string is converted to a StringRef, there
> is no safe way to recover that bit of information back (currently,
> libclang does an out-of-bounds read to check if the StringRef is
> NUL-terminated, but it is a bad thing and should be removed).  It is a
> waste of memory and CPU cycles to copy StringRefs that are actually
> NUL-terminated.
>
> The only reasonable way to solve this is to carry this bit of
> information with the string itself.  I considered two possible
> implementations:
>
> (1) an additional bit in StringRef;
> (2) a separate class.
>
> About (1): we could turn StringRef::Length into a bitfield and free up
> a bit to store information about a NUL terminator.  David Blaikie
> suggested (on IRC) that it would slow down StringRef::size(), so I did
> not consider it further.
>
> (2) is the proposed implementation.
>
> About the name: I am not really attached to this name, but I could
> only think of two good names:  CStringRef and StringRefNulTerminated.
> "CStringRef" is a bit ambigous (is it a typedef for "const char*"?!),
> and has only a single character difference from "StringRef", so I
> decided to err on the side of verbosity.
>
> Please review!
>
> Dmitri
>
> --
> main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
> (j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr at gmail.com>*/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130223/230c360c/attachment.html>


More information about the llvm-commits mailing list