[cfe-dev] createCXString reads one past end byte

Mon Jan 21 09:26:54 PST 2013

On Sun, Jan 20, 2013 at 9:39 AM, Dmitri Gribenko <gribozavr at gmail.com> wrote:
> Hello,
>
> Nakamura-san has drawn my attention to the following valgrind failure:
>
> http://lab.llvm.org:8011/builders/clang-x86_64-linux-vg/builds/643/steps/check-all/logs/Clang%20%3A%3A%20Index___comment-to-html-xml-conversion.cpp
>
> The issue is that createCXString has an optimization to check if the
> original string is nul-terminated.  (See CXString.cpp:46)  If the
> original StringRef is nul-terminated, then it does not duplicate the
> string.  This is a good optimization in general, but it requires
> reading one past end byte of the string.  In this case, this byte just
> happens to be uninitialized, because there was no need to initialize
> it to create a StringRef that was passed to createCXString.
>
> But this access can, in fact, trap, if it happens to be on the next
> page that is unallocated.
>
> Another possible case is when that byte happens to be nul during
> createCXString, but is modified afterwards.  When the libclang user
> code reads the string, it will read garbage at the end.
>
> The proper fix, I think, is to remove this optimization.

I, for one, agree.

>  Only this
> way we can guarantee correctness.  But I brought up this discussion
> because doing so can have a big performance impact.  (And somehow we
> got away with reading one past end for a long time...)
>
> The immediate fix for this failure is easy and it just proves that the
> real issue is inside createCXString:
>
> Index: lib/AST/CommentLexer.cpp
> ===================================================================
> --- lib/AST/CommentLexer.cpp    (revision 172983)
> +++ lib/AST/CommentLexer.cpp    (working copy)
> @@ -53,6 +53,7 @@
>    }
>
>    char *Resolved = Allocator.Allocate<char>(UNI_MAX_UTF8_BYTES_PER_CODE_POINT);
> +  memset(Resolved, 0, UNI_MAX_UTF8_BYTES_PER_CODE_POINT);
>    char *ResolvedPtr = Resolved;
>    if (ConvertCodePointToUTF8(CodePoint, ResolvedPtr))

Wouldn't it be more efficient to put: *ResolvedPtr = '\0' here, rather
than memsetting the whole array above?

(moot point, since I agree we should just remove such an optimization
entirely because it's unreliable/undefined)

>      return StringRef(Resolved, ResolvedPtr - Resolved);
>
> Dmitri
>
> --
> main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
> (j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr at gmail.com>*/
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev