[PATCH] A new ADT: StringRefNulTerminated

Dmitri Gribenko gribozavr at gmail.com
Mon Mar 4 12:43:40 PST 2013


On Sun, Feb 24, 2013 at 2:26 AM, Dmitri Gribenko <gribozavr at gmail.com> wrote:
> On Sun, Feb 24, 2013 at 2:11 AM, Chandler Carruth <chandlerc at google.com> wrote:
>> Stepping back (alot), I'm really unconvinced about the need for this class.
>
> That's a valid concern.  I tried to avoid doing this, because I
> understand that a new ADT imposes a new concept onto all maintainers.
>
>> On Wed, Feb 20, 2013 at 1:40 PM, Dmitri Gribenko <gribozavr at gmail.com>
>> wrote:
>>>
>>> Hello,
>>>
>>> I want to propose a new ADT: StringRefNulTerminated.
>>>
>>> The ``StringRefNulTerminated`` class
>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>
>>> ``StringRefNulTerminated`` is a subclass of ``StringRef`` that represents
>>> a
>>> reference to a NUL-terminated string.  It can be safely converted to
>>> 'const
>>> char *' without copying the underlying data.  Because it is a subclass of
>>> ``StringRef``, slicing implicitly occurs whenever a
>>> ``StringRefNulTerminated``
>>> object is passed to an API that requires a ``StringRef``.  Since
>>> ``StringRefNulTerminated`` does not contain any extra data, slicing
>>> preserves
>>> all the information (except for the fact that the string was
>>> NUL-terminated).
>>>
>>> The motivation behind this class is to preserve the information about the
>>> NUL
>>> terminator in the string.  With ``StringRef``, this information is lost.
>>> Also,
>>> one can not check if a ``StringRef`` contains a NUL-terminated string,
>>> thus it
>>> is required to copy the data to convert from a ``StringRef`` to
>>> ``const char *``.
>>
>>
>> Everything you describe here is true of a 'const char*'. Why not just pass
>> that through the API layers where it is important to have a C-string
>> representation?
>
> I tried doing that (in order to fix libclang issue -- one-past end
> read from StringRef -- without a performance impact), but I found some
> cases where dropping the length from the string might create a
> performance issue.  See below.
>
>> pro: It's a lot simpler and more clear about the fact that the code is
>> dealing with a C-string.
>> pro: It doesn't add a whole new class to ADT
>> pro: It doesn't add complexity of understanding how this class and StringRef
>> interact -- we already undrestand how to build a StringRef out of a
>> C-string.
>>
>> con: It requires users who need a StringRef-like API to build a StringRef
>> around it.
>>
>> This doesn't seem too bad... maybe there are other cons?
>
> You have identified the downside correctly.  I want to change some
> existing C++ APIs in Clang that return a StrnigRef that is always
> NUL-terminated.  Among them, there are cases where we always return a
> string literal, and the compiler can optimize out the strlen if we
> construct a StringRef immediately from the string literal.  If we
> would build a StringRef outside the function body, there would be a
> strlen call.  (See BuiltinType::getName for an example of this.)  In
> other case (StaticDiagCategoryRec::getName in
> lib/Basic/DiagnosticIDs.cpp) we explicitly pass the length of the
> string (I guess, to be 100% sure that strlen is not called).

Ping.  I do think that the advantages here outweigh the burden of a
new ADT.  Given that slicing from StringRefNulTerminated to StringRef
occurs transparently to the users, using this ADT should have a very
low mental overhead.

Dmitri

-- 
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr at gmail.com>*/



More information about the llvm-commits mailing list