[LLVMdev] (Not) instrumenting global string literals that end up in .cstrings on Mac

Anna Zaks ganna at apple.com
Thu Mar 21 16:22:39 PDT 2013


Alex,

I think finding a superset of globals that will end up in the "__cstring" section and not adding red zones to them is reasonable. You might be able to factor out the code that makes the decision but does not involve TargetMachine (ex: some of TargetLoweringObjectFile::getKindForGlobal). These are all constants anyway, so we are only loosing checks for invalid reads, not invalid writes.

There might be other, better solutions; I am not sure..

Cheers,
Anna.


On Mar 21, 2013, at 12:03 PM, Nick Kledzik <kledzik at apple.com> wrote:

> Alexander,
> 
> On Darwin the "__cstring" section (really section with type S_CSTRING_LITERAL) is defined to contain zero terminate strings of bytes that the linker can merge and re-order.  If you want pad bytes before and after the string, you need to put the strings in a different section (e.g. __TEXT, __const).
> 
> But, CF/NSString literals will be problematic.  The compiler emits a static NS/CFString object into a data section.  That object contains a pointer to its "backing" utf8 or utf16 string literal.  The linker coalesce the NS/CFString objects (so that two translation units that define @"hello" will wind up using the same object).  But to tell if two CF/NSString objects are the same, the linker must compare the string literal they point to.  And in that check is an assertion that the string is in a __cstring or __ustring (utf16) section.  So, putting the backing string for a CF/NSString into another section will cause a linker assertion.
> 
> -Nick
> 
> 
> On Mar 21, 2013, at 7:05 AM, Alexander Potapenko <glider at google.com> wrote:
>> (forgot to CC llvmdev)
>> 
>> On Thu, Mar 21, 2013 at 5:54 PM, Alexander Potapenko <glider at google.com> wrote:
>>> Hey Anna, Nick, Ted,
>>> 
>>> We've the following problem with string literals under ASan on Mac.
>>> Some global string constants end up being put into the .cstring
>>> section, for which the following rules apply:
>>> - the strings can't contain zeroes in their bodies
>>> - the link editor places only one copy of each literal into the
>>> output file's section
>>> 
>>> ASan usually instruments the globals by adding redzones to the end of
>>> them and creating a structure that contains the size of a global with
>>> and without the redzone.
>>> For the aforementioned strings the linker will delete the redzones,
>>> but leave that structure untouched, which will lead to corrupt shadow
>>> memory at run time.
>>> 
>>> Unfortunately at instrumentation time we can't tell for sure whether
>>> the string constant will be put into the .cstring section or not - the
>>> decision is taken at lowering time.
>>> https://code.google.com/p/address-sanitizer/issues/detail?id=171
>>> contains the writeup of the problem and a couple of suggestions on how
>>> it can be solved. But we aren't sure that any of the solutions is
>>> correct.
>>> I wonder if it's at all possible to understand that a given string
>>> constant is going to end up in a mergeable section. Otherwise, is it
>>> possible to make every string literal live in a non-mergeable section
>>> by setting the section name explicitly?
>>> 
>>> TIA,
>>> Alex

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130321/e318b67e/attachment.html>


More information about the llvm-dev mailing list