[PATCH] A new ADT: StringRefNulTerminated

Dmitri Gribenko gribozavr at gmail.com
Sat Feb 23 16:26:46 PST 2013


On Sun, Feb 24, 2013 at 2:11 AM, Chandler Carruth <chandlerc at google.com> wrote:
> Stepping back (alot), I'm really unconvinced about the need for this class.

That's a valid concern.  I tried to avoid doing this, because I
understand that a new ADT imposes a new concept onto all maintainers.

> On Wed, Feb 20, 2013 at 1:40 PM, Dmitri Gribenko <gribozavr at gmail.com>
> wrote:
>>
>> Hello,
>>
>> I want to propose a new ADT: StringRefNulTerminated.
>>
>> The ``StringRefNulTerminated`` class
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>
>> ``StringRefNulTerminated`` is a subclass of ``StringRef`` that represents
>> a
>> reference to a NUL-terminated string.  It can be safely converted to
>> 'const
>> char *' without copying the underlying data.  Because it is a subclass of
>> ``StringRef``, slicing implicitly occurs whenever a
>> ``StringRefNulTerminated``
>> object is passed to an API that requires a ``StringRef``.  Since
>> ``StringRefNulTerminated`` does not contain any extra data, slicing
>> preserves
>> all the information (except for the fact that the string was
>> NUL-terminated).
>>
>> The motivation behind this class is to preserve the information about the
>> NUL
>> terminator in the string.  With ``StringRef``, this information is lost.
>> Also,
>> one can not check if a ``StringRef`` contains a NUL-terminated string,
>> thus it
>> is required to copy the data to convert from a ``StringRef`` to
>> ``const char *``.
>
>
> Everything you describe here is true of a 'const char*'. Why not just pass
> that through the API layers where it is important to have a C-string
> representation?

I tried doing that (in order to fix libclang issue -- one-past end
read from StringRef -- without a performance impact), but I found some
cases where dropping the length from the string might create a
performance issue.  See below.

> pro: It's a lot simpler and more clear about the fact that the code is
> dealing with a C-string.
> pro: It doesn't add a whole new class to ADT
> pro: It doesn't add complexity of understanding how this class and StringRef
> interact -- we already undrestand how to build a StringRef out of a
> C-string.
>
> con: It requires users who need a StringRef-like API to build a StringRef
> around it.
>
> This doesn't seem too bad... maybe there are other cons?

You have identified the downside correctly.  I want to change some
existing C++ APIs in Clang that return a StrnigRef that is always
NUL-terminated.  Among them, there are cases where we always return a
string literal, and the compiler can optimize out the strlen if we
construct a StringRef immediately from the string literal.  If we
would build a StringRef outside the function body, there would be a
strlen call.  (See BuiltinType::getName for an example of this.)  In
other case (StaticDiagCategoryRec::getName in
lib/Basic/DiagnosticIDs.cpp) we explicitly pass the length of the
string (I guess, to be 100% sure that strlen is not called).

Dmitri

-- 
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr at gmail.com>*/



More information about the llvm-commits mailing list