[cfe-dev] createCXString reads one past end byte
Argyrios Kyrtzidis
akyrtzi at gmail.com
Mon Jan 21 11:13:15 PST 2013
On Jan 20, 2013, at 9:39 AM, Dmitri Gribenko <gribozavr at gmail.com> wrote:
> Hello,
>
> Nakamura-san has drawn my attention to the following valgrind failure:
>
> http://lab.llvm.org:8011/builders/clang-x86_64-linux-vg/builds/643/steps/check-all/logs/Clang%20%3A%3A%20Index___comment-to-html-xml-conversion.cpp
>
> The issue is that createCXString has an optimization to check if the
> original string is nul-terminated. (See CXString.cpp:46) If the
> original StringRef is nul-terminated, then it does not duplicate the
> string. This is a good optimization in general, but it requires
> reading one past end byte of the string. In this case, this byte just
> happens to be uninitialized, because there was no need to initialize
> it to create a StringRef that was passed to createCXString.
>
> But this access can, in fact, trap, if it happens to be on the next
> page that is unallocated.
>
> Another possible case is when that byte happens to be nul during
> createCXString, but is modified afterwards. When the libclang user
> code reads the string, it will read garbage at the end.
Yes, this is bad.
>
> The proper fix, I think, is to remove this optimization. Only this
> way we can guarantee correctness. But I brought up this discussion
> because doing so can have a big performance impact. (And somehow we
> got away with reading one past end for a long time...)
We could change how CXString works; instead of eagerly malloc'ing in case of a StringRef, have it stored in a "StringRef-kind" form and malloc when clang_getCString is called.
But also add a new function like clang_getStringWithLength, or something, for callers that can use that to avoid the extra malloc.
>
> The immediate fix for this failure is easy and it just proves that the
> real issue is inside createCXString:
>
> Index: lib/AST/CommentLexer.cpp
> ===================================================================
> --- lib/AST/CommentLexer.cpp (revision 172983)
> +++ lib/AST/CommentLexer.cpp (working copy)
> @@ -53,6 +53,7 @@
> }
>
> char *Resolved = Allocator.Allocate<char>(UNI_MAX_UTF8_BYTES_PER_CODE_POINT);
> + memset(Resolved, 0, UNI_MAX_UTF8_BYTES_PER_CODE_POINT);
> char *ResolvedPtr = Resolved;
> if (ConvertCodePointToUTF8(CodePoint, ResolvedPtr))
> return StringRef(Resolved, ResolvedPtr - Resolved);
>
> Dmitri
>
> --
> main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
> (j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr at gmail.com>*/
More information about the cfe-dev
mailing list