[cfe-dev] RFC: proper string handling in CXString
Steve O'Brien via cfe-dev
cfe-dev at lists.llvm.org
Thu Jan 11 10:16:45 PST 2018
I opened this issue: https://bugs.llvm.org/show_bug.cgi?id=35896
MSAN tests fail because CXString is trying to reuse string data from
LLVM::StringRef. However the latter doesn't guarantee that strings
are NUL-terminated, but CXStrings need to be. Calling
`createRef(llvm::StringRef)` then selectively reuses (if there's a
zero terminator) or copies (otherwise) the bytes referenced in
StringRef.
To do this, CXString needs to possibly look past the end of the string
allocation, e.g. `if (str[str.size()] != 0)`. If the allocation of
the string bytes ends at &str[str.size()-1], then MSAN fails because
of the out-of-bounds read of those data.
So, I was hoping to get some feedback or ideas on how to fix it. I'm
trying to make LLVM and Clang MSAN-clean, as I'm trying to debug other
issues and preexisting problems are getting in the way, and I figure
that fixing these issues, however minor some of them are, are a good
thing anyway. :)
Some ideas:
(1) Always copy StringRef data. Let `createRef(StringRef)` simply
jump to `createDup(StringRef)` instead, or codemod the former away.
Safe but involves more "memcpy"ing.
(2) Let CXString have only data and size fields, without worrying
about internal representation. Then when using `clang_getCString`, do
the allocation + string copy + extra NUL byte. Still does a "memcpy",
but only on strings the caller truly needs as C strings.
(3) My least favorite option, selectively ignore this particular MSAN
violation in some blacklist, and get on with life.
Other ideas?
Thanks!
--steveo
More information about the cfe-dev
mailing list