[cfe-dev] RFC: proper string handling in CXString

Steve O'Brien via cfe-dev cfe-dev at lists.llvm.org
Thu Jan 11 10:16:45 PST 2018


I opened this issue: https://bugs.llvm.org/show_bug.cgi?id=35896

MSAN tests fail because CXString is trying to reuse string data from
LLVM::StringRef.  However the latter doesn't guarantee that strings
are NUL-terminated, but CXStrings need to be.  Calling
`createRef(llvm::StringRef)` then selectively reuses (if there's a
zero terminator) or copies (otherwise) the bytes referenced in
StringRef.

To do this, CXString needs to possibly look past the end of the string
allocation, e.g. `if (str[str.size()] != 0)`.  If the allocation of
the string bytes ends at &str[str.size()-1], then MSAN fails because
of the out-of-bounds read of those data.

So, I was hoping to get some feedback or ideas on how to fix it.  I'm
trying to make LLVM and Clang MSAN-clean, as I'm trying to debug other
issues and preexisting problems are getting in the way, and I figure
that fixing these issues, however minor some of them are, are a good
thing anyway. :)

Some ideas:

(1) Always copy StringRef data.  Let `createRef(StringRef)` simply
jump to `createDup(StringRef)` instead, or codemod the former away.
Safe but involves more "memcpy"ing.
(2) Let CXString have only data and size fields, without worrying
about internal representation.  Then when using `clang_getCString`, do
the allocation + string copy + extra NUL byte.  Still does a "memcpy",
but only on strings the caller truly needs as C strings.
(3) My least favorite option, selectively ignore this particular MSAN
violation in some blacklist, and get on with life.

Other ideas?

Thanks!

--steveo



More information about the cfe-dev mailing list