<div dir="ltr"><br><br><div class="gmail_quote">On Wed Feb 11 2015 at 11:13:22 AM Greg Clayton <<a href="mailto:gclayton@apple.com">gclayton@apple.com</a>> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
> On Feb 11, 2015, at 10:18 AM, Zachary Turner <<a href="mailto:zturner@google.com" target="_blank">zturner@google.com</a>> wrote:<br>
><br>
> A patch is up which fixes a number of issues with strings not being null terminated in LLDB. The issues and error-proneness of using raw c strings is well documented and understood, so I would like to propose banning them in LLDB for new code [1].<br>
><br>
> I realize that's a tall order, and I don't expect all the occurrences of them to go away overnight, but issues related to strings have been popping up over and over again, and it would be nice if we could deal with more interesting problems.<br>
><br>
> LLVM has a very robust string manipulation library, and while it's not perfect, it would have prevented almost all of these issues from ever happening in the first place. Here is a quick summary of common C string types / operations and their corresponding LLVM equivalents:<br>
><br>
> Types:<br>
> char stack_buffer[n]; // Use llvm::SmallString<n>.<br>
<br>
That would be fine.<br>
<br>
> const char* foo; // Use llvm::StringRef foo.<br>
<br>
This may be fine as long as there is absolutely _no_ string manipulation on "foo" (like "strcat(...) etc. Use std::string for anything that requires storage. And we don't want to start using any code that does "let me assign an llvm::StringRef() with a ConstString("hello").<u></u>GetCString() so I don't have to worry about ownership. Adding string to the constant string pool should be reserved for things that actually need it (any const names like namespaces, variable names, type names, paths, etc) and also for when we return "const char *" through the public API.<br>
<br>
This also means that we can't trust that "foo" is NULL terminated (llvm::StringRef objects have a data pointer and a length and aren't required to be terminated) and it means we must call "foo.str().c_str()" any time we require NULL termination. So I actually don't like llvm::StringRef as a replacement "const char *"...<br></blockquote><div>We can't assume that const char*s are null terminated either. They're just pointers to raw buffers, which you may or may not have null terminated. So we have the same issue. By replacing all uses of const char* with StringRef(), we end up with exactly the same semantics. null terminated strings are null terminated, non null terminated strings aren't.</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
> Passing arguments;<br>
> void foo(char *, int len); // Use void foo(llvm::SmallVectorImpl<<u></u>char> &buf);<br>
<br>
I would prefer to use std::string here. Clearer and everyone knows how to use it. SmallVectors are nice, but they have the overhead of storing more data on the stack. Clang currently can create quite large stack frames when you have some code that has large switch statements where each switch statement has local variables. If those switch statements how SmallVector<char, PATH_MAX>, then you now have a case where if this code is called recursively we end up blowing the stack. We have seen this a lot in the DWARF parser, so I don't want to encourage using SmallVector for strings, since many places we use strings are for paths and other larger buffers. Any current std::string implementation can store up to 24 characters in the std::string without needing to do any allocations on the heap, so that essentially gets us a SmallVector<char, 24>. So I vote std::string.<br></blockquote><div>passing SmallVectorImpl<> is intended as a replacement for the case where you have this:</div><div><br></div><div>void foo(char *buf);</div><div>void bar() {</div><div> char buffer[n];</div><div> foo(buffer);</div><div>}</div><div><br></div><div>The equivalent LLVM code is this:</div><div><br></div><div>void foo(llvm::SmallVectorImpl<char> &buf);</div><div>void bar() {</div><div> llvm::SmallString<n> buffer;</div><div> foo(buffer);</div><div>}</div><div><br></div><div>Exactly the same amount of stack space. It's not intended to be a general replacement for std::string, which stores its data entirely on the heap. I do think that many places in LLDB that currently use stack buffers could be re-written just as well using heap buffers, and for that I agree std::string is fine.</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
> void foo(const char * foo); // Use void foo(llvm::StringRef foo);<br>
<br>
<br>
Again, I don't like this because you must call "foo.str().c_str()" in order to guarantee NULL termination.<br></blockquote><div>As mentioned previously, in the const char * version you have to guarantee that it was initialized with a null terminated string in the first place. The burden is the same either way. Either we're already making the assumption in the callee, in which case we can continue to make it with the StringRef(), or we aren't making the assumption in the callee, in which case we can continue not making the assumption but have the added benefit of having the length field passed to us.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
><br>
> Operations:<br>
> strcpy, strncpy, etc // Use member functions of SmallString / StringRef<br>
> sprintf, snprintf, etc // Use llvm::raw_ostream<br>
<br>
We have StringStream, people should be using that. raw_ostream doesn't support all printf style formats. So all sprintf and snprintf function calls should switch to using lldb_private::StreamString.<br></blockquote><div>I actually disagree here. Printf() has plenty of problems, such as non-portable and confusing arguments, lack of type safety, possibility of buffer overruns. I would actually prefer to use raw_ostream instead of StringStream.</div><div> </div></div></div>