[lldb-dev] raw c strings in lldb

Zachary Turner zturner at google.com
Wed Feb 11 17:10:46 PST 2015


Just to be clear here, the types StringRef and const char* carry the exact
same amount of information regarding whether or not they're null
terminated: none. It is up to the person who creates the underlying string
to ensure this. This is what i mean by "you can't assume a const char* is
null terminated either". We enforce this at the creation sites of the
backing strings (or at least try to), so we can continue to enforce this.
In other words, we don't need to do anything special. It should just work.

*If* someone wants to write a function that accepts a string which need not
be null terminated, it should be clearly documented as such, and all other
functions should be assumed to require null terminated StringRefs. But
again, if we're using SmallString, std::string, or any other form of "safe"
string (i.e. No snprintf, strcat, etc) this should always be the case.
On Wed, Feb 11, 2015 at 2:22 PM Zachary Turner <zturner at google.com> wrote:

> On Wed Feb 11 2015 at 2:04:38 PM Greg Clayton <gclayton at apple.com> wrote:
>
>>
>> > On Feb 11, 2015, at 11:23 AM, Zachary Turner <zturner at google.com>
>> wrote:
>> >
>> >
>> >
>> > On Wed Feb 11 2015 at 11:13:22 AM Greg Clayton <gclayton at apple.com>
>> wrote:
>> >
>> > > On Feb 11, 2015, at 10:18 AM, Zachary Turner <zturner at google.com>
>> wrote:
>> > >
>> > > A patch is up which fixes a number of issues with strings not being
>> null terminated in LLDB.  The issues and error-proneness of using raw c
>> strings is well documented and understood, so I would like to propose
>> banning them in LLDB for new code [1].
>> > >
>> > > I realize that's a tall order, and I don't expect all the occurrences
>> of them to go away overnight, but issues related to strings have been
>> popping up over and over again, and it would be nice if we could deal with
>> more interesting problems.
>> > >
>> > > LLVM has a very robust string manipulation library, and while it's
>> not perfect, it would have prevented almost all of these issues from ever
>> happening in the first place.  Here is a quick summary of common C string
>> types / operations and their corresponding LLVM equivalents:
>> > >
>> > > Types:
>> > > char stack_buffer[n];        // Use llvm::SmallString<n>.
>> >
>> > That would be fine.
>> >
>> > > const char* foo;             // Use llvm::StringRef foo.
>> >
>> > This may be fine as long as there is absolutely _no_ string
>> manipulation on "foo" (like "strcat(...) etc. Use std::string for anything
>> that requires storage. And we don't want to start using any code that does
>> "let me assign an llvm::StringRef() with a ConstString("hello").GetCString()
>> so I don't have to worry about ownership.  Adding string to the constant
>> string pool should be reserved for things that actually need it (any const
>> names like namespaces, variable names, type names, paths, etc) and also for
>> when we return "const char *" through the public API.
>> >
>> > This also means that we can't trust that "foo" is NULL terminated
>> (llvm::StringRef objects have a data pointer and a length and aren't
>> required to be terminated) and it means we must call "foo.str().c_str()"
>> any time we require NULL termination. So I actually don't like
>> llvm::StringRef as a replacement "const char *"...
>>
>> > We can't assume that const char*s are null terminated either.
>>
>> Yes we can, that is was is assumed by const char * right now for every
>> call we use it with.
>>
> Right, but my point is that if we change a const char * to a StringRef(),
> and that const char* was assumed to be null terminated, then the StringRef
> is also assumed to be null terminated.  You can just call StringRef::data()
> on it.
>
>
>>
>> But that isn't what is currently meant by _any_ of the functions that
>> take a "const char *" with no length parameter, so it means that any switch
>> to using llvm::StringRef requires we use ".str().c_str()" which I would
>> like to avoid.
>>
> I don't see why this is necessary.  Just because StringRef supports
> non-null terminated strings doesn't mean we have to use non null terminated
> strings with it.  declaring a SmallString<> and passing by SmallVectorImpl&
> supports null terminating strings.  You push_back(0) and then pop_back()
> before returning.  This is no different from what we're already doing by
> setting foo[n] = '\0' except that it's safer since there's no chance of a
> buffer overrun.
>
> We are already taking care to make sure all of our const char*s are
> properly null terminated.  Therefore when they are passed to functions that
> accept a StringRef, they will continue to be null terminated.
>
>
>
> >
>> >
>> > >
>> > > Operations:
>> > > strcpy, strncpy, etc         // Use member functions of SmallString /
>> StringRef
>> > > sprintf, snprintf, etc       // Use llvm::raw_ostream
>> >
>> > We have StringStream, people should be using that. raw_ostream doesn't
>> support all printf style formats. So all sprintf  and snprintf function
>> calls should switch to using lldb_private::StreamString.
>>
>> > I actually disagree here.  Printf() has plenty of problems, such as
>> non-portable and confusing arguments,
>>
>> I agree on this one, but we are getting around this with using PRI macros.
>>
>> > lack of type safety,
>>
>> most of this is taken care of when you have the printf format warnings on
>> your class.
>>
>> > possibility of buffer overruns.
>>
>> There are no buffer overruns in the Stream::Printf(). Check the
>> implementation if you need to.
>>
>> >  I would actually prefer to use raw_ostream instead of StringStream.
>>
>> I don't. I have written tools using streaming style stuff before (like
>> std::cout) and also written using printf style stuff and I find the printf
>> style stuff more readable and maintainable. I also don't like streams that
>> maintain state like the std::ostream does. Not sure if raw_ostream does any
>> state management, but when they do it causes bugs later for code that
>> doesn't always explicitly set the format, width, precision and much more:
>>
>> std::ostream &os;
>> os << std::hex << hex_num;
>> dump(os);
>> os << hex_num2;
>>
>> little did we know the stream was modified by "dump()" to set the format
>> to decimal so now you must always use:
>>
>> os << std::hex << hex_num2;
>>
>> (repeat for number width, precision and many other factors).
>>
> I agree, that's not a pleasant experience.  but as an aside,
> llvm::raw_ostream does not do this.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20150212/a334e92f/attachment.html>


More information about the lldb-dev mailing list