[cfe-dev] RFC: proper string handling in CXString

Vedant Kumar via cfe-dev cfe-dev at lists.llvm.org
Thu Jan 11 17:50:10 PST 2018


> On Jan 11, 2018, at 10:16 AM, Steve O'Brien via cfe-dev <cfe-dev at lists.llvm.org> wrote:
> 
> I opened this issue: https://bugs.llvm.org/show_bug.cgi?id=35896
> 
> MSAN tests fail because CXString is trying to reuse string data from
> LLVM::StringRef.  However the latter doesn't guarantee that strings
> are NUL-terminated, but CXStrings need to be.  Calling
> `createRef(llvm::StringRef)` then selectively reuses (if there's a
> zero terminator) or copies (otherwise) the bytes referenced in
> StringRef.
> 
> To do this, CXString needs to possibly look past the end of the string
> allocation, e.g. `if (str[str.size()] != 0)`.  If the allocation of
> the string bytes ends at &str[str.size()-1], then MSAN fails because
> of the out-of-bounds read of those data.
> 
> So, I was hoping to get some feedback or ideas on how to fix it.  I'm
> trying to make LLVM and Clang MSAN-clean, as I'm trying to debug other
> issues and preexisting problems are getting in the way, and I figure
> that fixing these issues, however minor some of them are, are a good
> thing anyway. :)
> 
> Some ideas:
> 
> (1) Always copy StringRef data.  Let `createRef(StringRef)` simply
> jump to `createDup(StringRef)` instead, or codemod the former away.
> Safe but involves more "memcpy"ing.
> (2) Let CXString have only data and size fields, without worrying
> about internal representation.  Then when using `clang_getCString`, do
> the allocation + string copy + extra NUL byte.  Still does a "memcpy",
> but only on strings the caller truly needs as C strings.

Of these options, I like (2) the most, provided there's some way to access the contents of a CXString without calling clang_getCString() on it (for performance-sensitive clients).

> (3) My least favorite option, selectively ignore this particular MSAN
> violation in some blacklist, and get on with life.

IMO this is a serious issue. E.g it seems plausible that a thread could store a non-zero value into the null-terminator byte after the CXString is constructed.

best,
vedant

> 
> Other ideas?
> 
> Thanks!
> 
> --steveo
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev




More information about the cfe-dev mailing list