[cfe-dev] createCXString reads one past end byte

NAKAMURA Takumi geek4civic at gmail.com
Mon Jan 21 19:51:21 PST 2013


For now, I have marked this test as xfail in r173121.

Please remove it when resolved.

...Takumi

2013/1/22 Argyrios Kyrtzidis <akyrtzi at gmail.com>:
> On Jan 20, 2013, at 9:39 AM, Dmitri Gribenko <gribozavr at gmail.com> wrote:
>
>> Hello,
>>
>> Nakamura-san has drawn my attention to the following valgrind failure:
>>
>> http://lab.llvm.org:8011/builders/clang-x86_64-linux-vg/builds/643/steps/check-all/logs/Clang%20%3A%3A%20Index___comment-to-html-xml-conversion.cpp
>>
>> The issue is that createCXString has an optimization to check if the
>> original string is nul-terminated.  (See CXString.cpp:46)  If the
>> original StringRef is nul-terminated, then it does not duplicate the
>> string.  This is a good optimization in general, but it requires
>> reading one past end byte of the string.  In this case, this byte just
>> happens to be uninitialized, because there was no need to initialize
>> it to create a StringRef that was passed to createCXString.
>>
>> But this access can, in fact, trap, if it happens to be on the next
>> page that is unallocated.
>>
>> Another possible case is when that byte happens to be nul during
>> createCXString, but is modified afterwards.  When the libclang user
>> code reads the string, it will read garbage at the end.
>
> Yes, this is bad.
>
>>
>> The proper fix, I think, is to remove this optimization.  Only this
>> way we can guarantee correctness.  But I brought up this discussion
>> because doing so can have a big performance impact.  (And somehow we
>> got away with reading one past end for a long time...)
>
> We could change how CXString works; instead of eagerly malloc'ing in case of a StringRef, have it stored in a "StringRef-kind" form and malloc when clang_getCString is called.
> But also add a new function like clang_getStringWithLength, or something, for callers that can use that to avoid the extra malloc.
>
>>
>> The immediate fix for this failure is easy and it just proves that the
>> real issue is inside createCXString:
>>
>> Index: lib/AST/CommentLexer.cpp
>> ===================================================================
>> --- lib/AST/CommentLexer.cpp  (revision 172983)
>> +++ lib/AST/CommentLexer.cpp  (working copy)
>> @@ -53,6 +53,7 @@
>>   }
>>
>>   char *Resolved = Allocator.Allocate<char>(UNI_MAX_UTF8_BYTES_PER_CODE_POINT);
>> +  memset(Resolved, 0, UNI_MAX_UTF8_BYTES_PER_CODE_POINT);
>>   char *ResolvedPtr = Resolved;
>>   if (ConvertCodePointToUTF8(CodePoint, ResolvedPtr))
>>     return StringRef(Resolved, ResolvedPtr - Resolved);
>>
>> Dmitri
>>
>> --
>> main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
>> (j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr at gmail.com>*/
>



More information about the cfe-dev mailing list