[llvm-bugs] [Bug 35896] New: MSAN read-past-string-end in CXString

via llvm-bugs llvm-bugs at lists.llvm.org
Wed Jan 10 18:58:54 PST 2018


https://bugs.llvm.org/show_bug.cgi?id=35896

            Bug ID: 35896
           Summary: MSAN read-past-string-end in CXString
           Product: clang
           Version: trunk
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P
         Component: libclang
          Assignee: unassignedclangbugs at nondot.org
          Reporter: steve at obrien.cc
                CC: klimek at google.com, llvm-bugs at lists.llvm.org

Class `CXString` has a method `createRef(llvm::StringRef)` that tries to
reference the bytes of an existing string, without copying, if possible.  (We
can assume the pre-existing string bytes' memory remains unchanged, allocated,
and otherwise "good".)

A `StringRef` represents a run of sequential chars in memory; whereas a
`CXString` always points to a C-like string, i.e., there must be an array
somewhere of bytes, terminated by a NUL character.

`StringRef` doesn't have that NUL terminator requirement; so `createRef`, which
wants to recycle existing memory might be dealing with a NUL-terminated string
(which it can reuse) or otherwise has to copy the non-NUL terminated bytes into
a new array, with one extra byte for that terminator.

The trouble is this: `CXString` checks the byte at `str[stringLength]`, which
is technically out-of-bounds for the string.  If that byte is 0 then it's a
NUL-terminated C string and it can be reused (otherwise it has to be copied).

Since that access is one past the bounds of the string, this raises an MSAN
error.

One easy fix is to always copy the string data and never attempt to reuse bytes
from a `StringRef`.  I fear that increased byte-copies will waste both memory
and CPU.  (As correct as this approach is, it's inefficient.)

Another is to make `CXString`s look more like `StringRef`s, and include a
length / end-of-string pointer, to avoid the NUL requirement.  But as this
library is used in primarily another language (via `cindex` python bindings)
I'm not sure whether this is feasible or not.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20180111/972a529e/attachment.html>


More information about the llvm-bugs mailing list