[cfe-dev] createCXString reads one past end byte

Mon Jan 21 11:13:15 PST 2013

On Jan 20, 2013, at 9:39 AM, Dmitri Gribenko <gribozavr at gmail.com> wrote:

> Hello,
> 
> Nakamura-san has drawn my attention to the following valgrind failure:
> 
> http://lab.llvm.org:8011/builders/clang-x86_64-linux-vg/builds/643/steps/check-all/logs/Clang%20%3A%3A%20Index___comment-to-html-xml-conversion.cpp
> 
> The issue is that createCXString has an optimization to check if the
> original string is nul-terminated.  (See CXString.cpp:46)  If the
> original StringRef is nul-terminated, then it does not duplicate the
> string.  This is a good optimization in general, but it requires
> reading one past end byte of the string.  In this case, this byte just
> happens to be uninitialized, because there was no need to initialize
> it to create a StringRef that was passed to createCXString.
> 
> But this access can, in fact, trap, if it happens to be on the next
> page that is unallocated.
> 
> Another possible case is when that byte happens to be nul during
> createCXString, but is modified afterwards.  When the libclang user
> code reads the string, it will read garbage at the end.

Yes, this is bad.

> 
> The proper fix, I think, is to remove this optimization.  Only this
> way we can guarantee correctness.  But I brought up this discussion
> because doing so can have a big performance impact.  (And somehow we
> got away with reading one past end for a long time...)

We could change how CXString works; instead of eagerly malloc'ing in case of a StringRef, have it stored in a "StringRef-kind" form and malloc when clang_getCString is called.
But also add a new function like clang_getStringWithLength, or something, for callers that can use that to avoid the extra malloc.

> 
> The immediate fix for this failure is easy and it just proves that the
> real issue is inside createCXString:
> 
> Index: lib/AST/CommentLexer.cpp
> ===================================================================
> --- lib/AST/CommentLexer.cpp	(revision 172983)
> +++ lib/AST/CommentLexer.cpp	(working copy)
> @@ -53,6 +53,7 @@
>   }
> 
>   char *Resolved = Allocator.Allocate<char>(UNI_MAX_UTF8_BYTES_PER_CODE_POINT);
> +  memset(Resolved, 0, UNI_MAX_UTF8_BYTES_PER_CODE_POINT);
>   char *ResolvedPtr = Resolved;
>   if (ConvertCodePointToUTF8(CodePoint, ResolvedPtr))
>     return StringRef(Resolved, ResolvedPtr - Resolved);
> 
> Dmitri
> 
> -- 
> main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
> (j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr at gmail.com>*/