[llvm-dev] UB in MemoryBufferMMapFile
Mehdi Amini via llvm-dev
llvm-dev at lists.llvm.org
Thu Nov 17 21:45:30 PST 2016
> On Nov 17, 2016, at 9:29 PM, Chris Lattner via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>
> On Nov 17, 2016, at 9:13 AM, Justin Bogner <mail at justinbogner.com> wrote:
>>>> This is UB, since we're reading one past an allocated region, and I have
>>>> an internal test case where I'm seeing it crash because that memory
>>>> happens to be unreadable.
>>>
>>> Well that is wrong. It should mmap the file, and if it knows that the
>>> file is an exact multiple of a page size, it should copy it into a
>>> malloc’d buffer that is one byte larger, and put a zero there.
>>
>> This doesn't seem to be working correctly, unless
>> sys::Process::getPageSize() is returning the wrong thing for me.
>>
>> I'm getting here from getOpenFileImpl() in MemoryBuffer.cpp, which is
>> deciding it's safe to mmap (by calling shouldUseMMap, where the page
>> size check you mention is). The file is 561152 bytes, and the page
>> size reports 16384, so this is not a multiple of page size (0x89000 &
>> 0x3fff == 0x1000). Yet, the memory one byte past the mmap is not
>> readable.
>
> That sounds like sys::Process::getPageSize() is broken or that there is a kernel bug. mmap is defined in terms of page granularity.
It depends “what kind of pages”?
From https://developer.apple.com/library/content/documentation/Performance/Conceptual/ManagingMemory/Articles/AboutMemory.html :
"In OS X and in earlier versions of iOS, the size of a page is 4 kilobytes. In later versions of iOS, A7- and A8-based systems expose 16-kilobyte pages to the 64-bit userspace backed by 4-kilobyte physical pages, while A9 systems expose 16-kilobyte pages backed by 16-kilobyte physical pages."
I wonder if this difference between 16-kilobyte userspace pages vs 4KB physical pages couldn’t explain the behavior here: mmap will be handled by the kernel per 4kB pages, while sys::Process::getPageSize() returns the user space pages.
This is assuming Justin was testing on iOS with an A7/A8-based system.
>
>>
>>>> There are three options here:
>>>>
>>>> 1. Remove the assert - it isn't possible to write correctly.
>>>
>>> This will break clang. Clang “knows” that files end with a nul byte.
>>
>> Strictly speaking, it won't "break" clang, it will just make a certain
>> class of bugs harder to track down.
>
> Clang’s lexer performance depends on this. I agree it won’t break the functionality of clang, but regressing compile time perf isn’t acceptable either.
I think Justin was saying to just remove the *assert*, nothing else. Which wouldn’t break clang but may lead to other bugs down the line if the issue isn’t caught by the assert. This shouldn’t regress clang :)
—
Mehdi
More information about the llvm-dev
mailing list