[LLVMdev] (Not) instrumenting global string literals that end up in .cstrings on Mac

Nick Kledzik kledzik at apple.com
Thu Mar 21 12:03:33 PDT 2013


Alexander,

On Darwin the "__cstring" section (really section with type S_CSTRING_LITERAL) is defined to contain zero terminate strings of bytes that the linker can merge and re-order.  If you want pad bytes before and after the string, you need to put the strings in a different section (e.g. __TEXT, __const).

But, CF/NSString literals will be problematic.  The compiler emits a static NS/CFString object into a data section.  That object contains a pointer to its "backing" utf8 or utf16 string literal.  The linker coalesce the NS/CFString objects (so that two translation units that define @"hello" will wind up using the same object).  But to tell if two CF/NSString objects are the same, the linker must compare the string literal they point to.  And in that check is an assertion that the string is in a __cstring or __ustring (utf16) section.  So, putting the backing string for a CF/NSString into another section will cause a linker assertion.

-Nick


On Mar 21, 2013, at 7:05 AM, Alexander Potapenko <glider at google.com> wrote:
> (forgot to CC llvmdev)
> 
> On Thu, Mar 21, 2013 at 5:54 PM, Alexander Potapenko <glider at google.com> wrote:
>> Hey Anna, Nick, Ted,
>> 
>> We've the following problem with string literals under ASan on Mac.
>> Some global string constants end up being put into the .cstring
>> section, for which the following rules apply:
>> - the strings can't contain zeroes in their bodies
>> - the link editor places only one copy of each literal into the
>> output file's section
>> 
>> ASan usually instruments the globals by adding redzones to the end of
>> them and creating a structure that contains the size of a global with
>> and without the redzone.
>> For the aforementioned strings the linker will delete the redzones,
>> but leave that structure untouched, which will lead to corrupt shadow
>> memory at run time.
>> 
>> Unfortunately at instrumentation time we can't tell for sure whether
>> the string constant will be put into the .cstring section or not - the
>> decision is taken at lowering time.
>> https://code.google.com/p/address-sanitizer/issues/detail?id=171
>> contains the writeup of the problem and a couple of suggestions on how
>> it can be solved. But we aren't sure that any of the solutions is
>> correct.
>> I wonder if it's at all possible to understand that a given string
>> constant is going to end up in a mergeable section. Otherwise, is it
>> possible to make every string literal live in a non-mergeable section
>> by setting the section name explicitly?
>> 
>> TIA,
>> Alex




More information about the llvm-dev mailing list