[lldb-dev] raw c strings in lldb

Greg Clayton gclayton at apple.com
Wed Feb 11 11:13:21 PST 2015


> On Feb 11, 2015, at 10:18 AM, Zachary Turner <zturner at google.com> wrote:
> 
> A patch is up which fixes a number of issues with strings not being null terminated in LLDB.  The issues and error-proneness of using raw c strings is well documented and understood, so I would like to propose banning them in LLDB for new code [1].
> 
> I realize that's a tall order, and I don't expect all the occurrences of them to go away overnight, but issues related to strings have been popping up over and over again, and it would be nice if we could deal with more interesting problems.
> 
> LLVM has a very robust string manipulation library, and while it's not perfect, it would have prevented almost all of these issues from ever happening in the first place.  Here is a quick summary of common C string types / operations and their corresponding LLVM equivalents:
> 
> Types:
> char stack_buffer[n];        // Use llvm::SmallString<n>.

That would be fine.

> const char* foo;             // Use llvm::StringRef foo.

This may be fine as long as there is absolutely _no_ string manipulation on "foo" (like "strcat(...) etc. Use std::string for anything that requires storage. And we don't want to start using any code that does "let me assign an llvm::StringRef() with a ConstString("hello").GetCString() so I don't have to worry about ownership.  Adding string to the constant string pool should be reserved for things that actually need it (any const names like namespaces, variable names, type names, paths, etc) and also for when we return "const char *" through the public API.

This also means that we can't trust that "foo" is NULL terminated (llvm::StringRef objects have a data pointer and a length and aren't required to be terminated) and it means we must call "foo.str().c_str()" any time we require NULL termination. So I actually don't like llvm::StringRef as a replacement "const char *"...

> Passing arguments;
> void foo(char *, int len);   // Use void foo(llvm::SmallVectorImpl<char> &buf);

I would prefer to use std::string here. Clearer and everyone knows how to use it. SmallVectors are nice, but they have the overhead of storing more data on the stack. Clang currently can create quite large stack frames when you have some code that has large switch statements where each switch statement has local variables. If those switch statements how SmallVector<char, PATH_MAX>, then you now have a case where if this code is called recursively we end up blowing the stack. We have seen this a lot in the DWARF parser, so I don't want to encourage using SmallVector for strings, since many places we use strings are for paths and other larger buffers. Any current std::string implementation can store up to 24 characters in the std::string without needing to do any allocations on the heap, so that essentially gets us a SmallVector<char, 24>. So I vote std::string.

> void foo(const char * foo);  // Use void foo(llvm::StringRef foo);


Again, I don't like this because you must call "foo.str().c_str()" in order to guarantee NULL termination.

> 
> Operations:
> strcpy, strncpy, etc         // Use member functions of SmallString / StringRef
> sprintf, snprintf, etc       // Use llvm::raw_ostream

We have StringStream, people should be using that. raw_ostream doesn't support all printf style formats. So all sprintf  and snprintf function calls should switch to using lldb_private::StreamString.

> 
> We can fix existing occurrences gradually / as time permits, but I would prefer to see a trend towards new code using llvm's string classes and formatting library, and when you are touching a piece of code that is messing with raw c-strings, that's a perfect time to change that code to migrate to LLVM strings.

> 
> [1] There are a couple of places where we can't ban them entirely.  This relates to the public API (i.e. can't change an existing signature), and va_args off the top of my head.  The va_args stuff though, if generally useful, if stuff we can sink into LLVM, there's nothing debuggery about formatting strings.
> 
> Thoughts?  Suggestions?  Comments?

I don't agree for my reasons mentioned above.

My 2 cents is:
1 - keep using "const char *" since they can guarantee NULL termination
2 - Use "std::string" for functions that have (char *, len)
3 - Switch all sprintf and snprintf over to use StreamString







More information about the lldb-dev mailing list