[lldb-dev] raw c strings in lldb

Zachary Turner zturner at google.com
Wed Feb 11 11:23:13 PST 2015


On Wed Feb 11 2015 at 11:13:22 AM Greg Clayton <gclayton at apple.com> wrote:

>
> > On Feb 11, 2015, at 10:18 AM, Zachary Turner <zturner at google.com> wrote:
> >
> > A patch is up which fixes a number of issues with strings not being null
> terminated in LLDB.  The issues and error-proneness of using raw c strings
> is well documented and understood, so I would like to propose banning them
> in LLDB for new code [1].
> >
> > I realize that's a tall order, and I don't expect all the occurrences of
> them to go away overnight, but issues related to strings have been popping
> up over and over again, and it would be nice if we could deal with more
> interesting problems.
> >
> > LLVM has a very robust string manipulation library, and while it's not
> perfect, it would have prevented almost all of these issues from ever
> happening in the first place.  Here is a quick summary of common C string
> types / operations and their corresponding LLVM equivalents:
> >
> > Types:
> > char stack_buffer[n];        // Use llvm::SmallString<n>.
>
> That would be fine.
>
> > const char* foo;             // Use llvm::StringRef foo.
>
> This may be fine as long as there is absolutely _no_ string manipulation
> on "foo" (like "strcat(...) etc. Use std::string for anything that requires
> storage. And we don't want to start using any code that does "let me assign
> an llvm::StringRef() with a ConstString("hello").GetCString() so I don't
> have to worry about ownership.  Adding string to the constant string pool
> should be reserved for things that actually need it (any const names like
> namespaces, variable names, type names, paths, etc) and also for when we
> return "const char *" through the public API.
>
> This also means that we can't trust that "foo" is NULL terminated
> (llvm::StringRef objects have a data pointer and a length and aren't
> required to be terminated) and it means we must call "foo.str().c_str()"
> any time we require NULL termination. So I actually don't like
> llvm::StringRef as a replacement "const char *"...
>
We can't assume that const char*s are null terminated either.  They're just
pointers to raw buffers, which you may or may not have null terminated.  So
we have the same issue.  By replacing all uses of const char* with
StringRef(), we end up with exactly the same semantics.  null terminated
strings are null terminated, non null terminated strings aren't.



>
> > Passing arguments;
> > void foo(char *, int len);   // Use void foo(llvm::SmallVectorImpl<char>
> &buf);
>
> I would prefer to use std::string here. Clearer and everyone knows how to
> use it. SmallVectors are nice, but they have the overhead of storing more
> data on the stack. Clang currently can create quite large stack frames when
> you have some code that has large switch statements where each switch
> statement has local variables. If those switch statements how
> SmallVector<char, PATH_MAX>, then you now have a case where if this code is
> called recursively we end up blowing the stack. We have seen this a lot in
> the DWARF parser, so I don't want to encourage using SmallVector for
> strings, since many places we use strings are for paths and other larger
> buffers. Any current std::string implementation can store up to 24
> characters in the std::string without needing to do any allocations on the
> heap, so that essentially gets us a SmallVector<char, 24>. So I vote
> std::string.
>
passing SmallVectorImpl<> is intended as a replacement for the case where
you have this:

void foo(char *buf);
void bar() {
    char buffer[n];
    foo(buffer);
}

The equivalent LLVM code is this:

void foo(llvm::SmallVectorImpl<char> &buf);
void bar() {
    llvm::SmallString<n> buffer;
    foo(buffer);
}

Exactly the same amount of stack space.  It's not intended to be a general
replacement for std::string, which stores its data entirely on the heap.  I
do think that many places in LLDB that currently use stack buffers could be
re-written just as well using heap buffers, and for that I agree
std::string is fine.



>
> > void foo(const char * foo);  // Use void foo(llvm::StringRef foo);
>
>
> Again, I don't like this because you must call "foo.str().c_str()" in
> order to guarantee NULL termination.
>
As mentioned previously, in the const char * version you have to guarantee
that it was initialized with a null terminated string in the first place.
The burden is the same either way.  Either we're already making the
assumption in the callee, in which case we can continue to make it with the
StringRef(), or we aren't making the assumption in the callee, in which
case we can continue not making the assumption but have the added benefit
of having the length field passed to us.


>
> >
> > Operations:
> > strcpy, strncpy, etc         // Use member functions of SmallString /
> StringRef
> > sprintf, snprintf, etc       // Use llvm::raw_ostream
>
> We have StringStream, people should be using that. raw_ostream doesn't
> support all printf style formats. So all sprintf  and snprintf function
> calls should switch to using lldb_private::StreamString.
>
I actually disagree here.  Printf() has plenty of problems, such as
non-portable and confusing arguments, lack of type safety, possibility of
buffer overruns.  I would actually prefer to use raw_ostream instead of
StringStream.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20150211/29d3556e/attachment.html>


More information about the lldb-dev mailing list