[PATCH] A new ADT: StringRefNulTerminated

Dmitri Gribenko gribozavr at gmail.com
Wed Feb 20 13:40:08 PST 2013


Hello,

I want to propose a new ADT: StringRefNulTerminated.

The ``StringRefNulTerminated`` class
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

``StringRefNulTerminated`` is a subclass of ``StringRef`` that represents a
reference to a NUL-terminated string.  It can be safely converted to 'const
char *' without copying the underlying data.  Because it is a subclass of
``StringRef``, slicing implicitly occurs whenever a ``StringRefNulTerminated``
object is passed to an API that requires a ``StringRef``.  Since
``StringRefNulTerminated`` does not contain any extra data, slicing preserves
all the information (except for the fact that the string was NUL-terminated).

The motivation behind this class is to preserve the information about the NUL
terminator in the string.  With ``StringRef``, this information is lost.  Also,
one can not check if a ``StringRef`` contains a NUL-terminated string, thus it
is required to copy the data to convert from a ``StringRef`` to
``const char *``.

Example:

  // Function returning a NUL-terminated string.
  StringRefNulTerminated getAString();

  StringRefNulTerminated S1 = getAString();
  StringRef S2 = getAString(); // Slicing to a StringRef.
  const char *S3 = getAString().c_str(); // Convert back to a 'const char *'
                                         // without copying data.
  const char *S4 = S1.c_str(); // Same.

Discussion
^^^^^^^^^^

The first user of the "convert to const char *" functionality is
libclang that exposes StringRefs via the C API.  It happens that quite
a few C++ APIs in Clang return NUL-terminated strings as StringRefs.
But after a NUL-terminated string is converted to a StringRef, there
is no safe way to recover that bit of information back (currently,
libclang does an out-of-bounds read to check if the StringRef is
NUL-terminated, but it is a bad thing and should be removed).  It is a
waste of memory and CPU cycles to copy StringRefs that are actually
NUL-terminated.

The only reasonable way to solve this is to carry this bit of
information with the string itself.  I considered two possible
implementations:

(1) an additional bit in StringRef;
(2) a separate class.

About (1): we could turn StringRef::Length into a bitfield and free up
a bit to store information about a NUL terminator.  David Blaikie
suggested (on IRC) that it would slow down StringRef::size(), so I did
not consider it further.

(2) is the proposed implementation.

About the name: I am not really attached to this name, but I could
only think of two good names:  CStringRef and StringRefNulTerminated.
"CStringRef" is a bit ambigous (is it a typedef for "const char*"?!),
and has only a single character difference from "StringRef", so I
decided to err on the side of verbosity.

Please review!

Dmitri

-- 
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr at gmail.com>*/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: string-ref-nul-terminated-v1.patch
Type: application/octet-stream
Size: 3590 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130220/d540ed2e/attachment.obj>


More information about the llvm-commits mailing list