[llvm-dev] UB in MemoryBufferMMapFile

Mehdi Amini via llvm-dev llvm-dev at lists.llvm.org
Thu Nov 17 21:45:30 PST 2016


> On Nov 17, 2016, at 9:29 PM, Chris Lattner via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> On Nov 17, 2016, at 9:13 AM, Justin Bogner <mail at justinbogner.com> wrote:
>>>> This is UB, since we're reading one past an allocated region, and I have
>>>> an internal test case where I'm seeing it crash because that memory
>>>> happens to be unreadable.
>>> 
>>> Well that is wrong.  It should mmap the file, and if it knows that the
>>> file is an exact multiple of a page size, it should copy it into a
>>> malloc’d buffer that is one byte larger, and put a zero there.
>> 
>> This doesn't seem to be working correctly, unless
>> sys::Process::getPageSize() is returning the wrong thing for me.
>> 
>> I'm getting here from getOpenFileImpl() in MemoryBuffer.cpp, which is
>> deciding it's safe to mmap (by calling shouldUseMMap, where the page
>> size check you mention is). The file is 561152 bytes, and the page
>> size reports 16384, so this is not a multiple of page size (0x89000 &
>> 0x3fff == 0x1000). Yet, the memory one byte past the mmap is not
>> readable.
> 
> That sounds like sys::Process::getPageSize() is broken or that there is a kernel bug.  mmap is defined in terms of page granularity.

It depends “what kind of pages”?

From https://developer.apple.com/library/content/documentation/Performance/Conceptual/ManagingMemory/Articles/AboutMemory.html :
"In OS X and in earlier versions of iOS, the size of a page is 4 kilobytes. In later versions of iOS, A7- and A8-based systems expose 16-kilobyte pages to the 64-bit userspace backed by 4-kilobyte physical pages, while A9 systems expose 16-kilobyte pages backed by 16-kilobyte physical pages."

I wonder if this difference between 16-kilobyte userspace pages vs 4KB physical pages couldn’t explain the behavior here: mmap will be handled by the kernel per 4kB pages, while  sys::Process::getPageSize() returns the user space pages.

This is assuming Justin was testing on iOS with an A7/A8-based system.

> 
>> 
>>>> There are three options here:
>>>> 
>>>> 1. Remove the assert - it isn't possible to write correctly.
>>> 
>>> This will break clang.  Clang “knows” that files end with a nul byte.
>> 
>> Strictly speaking, it won't "break" clang, it will just make a certain
>> class of bugs harder to track down.
> 
> Clang’s lexer performance depends on this.  I agree it won’t break the functionality of clang, but regressing compile time perf isn’t acceptable either.

I think Justin was saying to just remove the *assert*, nothing else. Which wouldn’t break clang but may lead to other bugs down the line if the issue isn’t caught by the assert. This shouldn’t regress clang :)

— 
Mehdi



More information about the llvm-dev mailing list