[lldb-dev] raw c strings in lldb

Zachary Turner zturner at google.com
Wed Feb 11 14:22:52 PST 2015


On Wed Feb 11 2015 at 2:04:38 PM Greg Clayton <gclayton at apple.com> wrote:

>
> > On Feb 11, 2015, at 11:23 AM, Zachary Turner <zturner at google.com> wrote:
> >
> >
> >
> > On Wed Feb 11 2015 at 11:13:22 AM Greg Clayton <gclayton at apple.com>
> wrote:
> >
> > > On Feb 11, 2015, at 10:18 AM, Zachary Turner <zturner at google.com>
> wrote:
> > >
> > > A patch is up which fixes a number of issues with strings not being
> null terminated in LLDB.  The issues and error-proneness of using raw c
> strings is well documented and understood, so I would like to propose
> banning them in LLDB for new code [1].
> > >
> > > I realize that's a tall order, and I don't expect all the occurrences
> of them to go away overnight, but issues related to strings have been
> popping up over and over again, and it would be nice if we could deal with
> more interesting problems.
> > >
> > > LLVM has a very robust string manipulation library, and while it's not
> perfect, it would have prevented almost all of these issues from ever
> happening in the first place.  Here is a quick summary of common C string
> types / operations and their corresponding LLVM equivalents:
> > >
> > > Types:
> > > char stack_buffer[n];        // Use llvm::SmallString<n>.
> >
> > That would be fine.
> >
> > > const char* foo;             // Use llvm::StringRef foo.
> >
> > This may be fine as long as there is absolutely _no_ string manipulation
> on "foo" (like "strcat(...) etc. Use std::string for anything that requires
> storage. And we don't want to start using any code that does "let me assign
> an llvm::StringRef() with a ConstString("hello").GetCString() so I don't
> have to worry about ownership.  Adding string to the constant string pool
> should be reserved for things that actually need it (any const names like
> namespaces, variable names, type names, paths, etc) and also for when we
> return "const char *" through the public API.
> >
> > This also means that we can't trust that "foo" is NULL terminated
> (llvm::StringRef objects have a data pointer and a length and aren't
> required to be terminated) and it means we must call "foo.str().c_str()"
> any time we require NULL termination. So I actually don't like
> llvm::StringRef as a replacement "const char *"...
>
> > We can't assume that const char*s are null terminated either.
>
> Yes we can, that is was is assumed by const char * right now for every
> call we use it with.
>
Right, but my point is that if we change a const char * to a StringRef(),
and that const char* was assumed to be null terminated, then the StringRef
is also assumed to be null terminated.  You can just call StringRef::data()
on it.


>
> But that isn't what is currently meant by _any_ of the functions that take
> a "const char *" with no length parameter, so it means that any switch to
> using llvm::StringRef requires we use ".str().c_str()" which I would like
> to avoid.
>
I don't see why this is necessary.  Just because StringRef supports
non-null terminated strings doesn't mean we have to use non null terminated
strings with it.  declaring a SmallString<> and passing by SmallVectorImpl&
supports null terminating strings.  You push_back(0) and then pop_back()
before returning.  This is no different from what we're already doing by
setting foo[n] = '\0' except that it's safer since there's no chance of a
buffer overrun.

We are already taking care to make sure all of our const char*s are
properly null terminated.  Therefore when they are passed to functions that
accept a StringRef, they will continue to be null terminated.



>
> >
> > >
> > > Operations:
> > > strcpy, strncpy, etc         // Use member functions of SmallString /
> StringRef
> > > sprintf, snprintf, etc       // Use llvm::raw_ostream
> >
> > We have StringStream, people should be using that. raw_ostream doesn't
> support all printf style formats. So all sprintf  and snprintf function
> calls should switch to using lldb_private::StreamString.
>
> > I actually disagree here.  Printf() has plenty of problems, such as
> non-portable and confusing arguments,
>
> I agree on this one, but we are getting around this with using PRI macros.
>
> > lack of type safety,
>
> most of this is taken care of when you have the printf format warnings on
> your class.
>
> > possibility of buffer overruns.
>
> There are no buffer overruns in the Stream::Printf(). Check the
> implementation if you need to.
>
> >  I would actually prefer to use raw_ostream instead of StringStream.
>
> I don't. I have written tools using streaming style stuff before (like
> std::cout) and also written using printf style stuff and I find the printf
> style stuff more readable and maintainable. I also don't like streams that
> maintain state like the std::ostream does. Not sure if raw_ostream does any
> state management, but when they do it causes bugs later for code that
> doesn't always explicitly set the format, width, precision and much more:
>
> std::ostream &os;
> os << std::hex << hex_num;
> dump(os);
> os << hex_num2;
>
> little did we know the stream was modified by "dump()" to set the format
> to decimal so now you must always use:
>
> os << std::hex << hex_num2;
>
> (repeat for number width, precision and many other factors).
>
I agree, that's not a pleasant experience.  but as an aside,
llvm::raw_ostream does not do this.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20150211/0ace3799/attachment.html>


More information about the lldb-dev mailing list