[llvm-dev] UB in MemoryBufferMMapFile

Justin Bogner via llvm-dev llvm-dev at lists.llvm.org
Thu Nov 17 09:13:01 PST 2016

Chris Lattner <clattner at apple.com> writes:
> On Nov 16, 2016, at 9:46 PM, Justin Bogner via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> In MemoryBuffer::init, we have an assert that reads the memory at
>> `BufEnd`, which is one past the end of some memory region:
>> from lib/Support/MemoryBuffer.cpp:45:
>>> void MemoryBuffer::init(const char *BufStart, const char *BufEnd,
>>>                        bool RequiresNullTerminator) {
>>>  assert((!RequiresNullTerminator || BufEnd[0] == 0) &&
>>>         "Buffer is not null terminated!");
>>>  BufferStart = BufStart;
>>>  BufferEnd = BufEnd;
>>> }
> Commenting from a historical perspective, this is as intended.  In the
> case of an mmapped buffer, you usually mmap some file that is not a
> multiple of a page size, and the OS zero fills the rest of the page.
> This makes this behavior free.  In the case where you’re not mmaping a
> file (e.g. for clang, if the file is small) it is easy enough to
> malloc one byte more than you need to ensure the zero byte is present.
>> However, this can be, and often is, one past an allocated region, since
>> in MemoryBufferMMapFile we mmap `Len` bytes and then call this with
>> `BufEnd = Start + Len`:
>> from lib/Support/MemoryBuffer.cpp:210:
>>>  MemoryBufferMMapFile(bool RequiresNullTerminator, int FD, uint64_t Len,
>>>                       uint64_t Offset, std::error_code &EC)
>>>      : MFR(FD, sys::fs::mapped_file_region::readonly,
>>>            getLegalMapSize(Len, Offset), getLegalMapOffset(Offset), EC) {
>>>    if (!EC) {
>>>      const char *Start = getStart(Len, Offset);
>>>      init(Start, Start + Len, RequiresNullTerminator);
>>>    }
>>>  }
>> This is UB, since we're reading one past an allocated region, and I have
>> an internal test case where I'm seeing it crash because that memory
>> happens to be unreadable.
> Well that is wrong.  It should mmap the file, and if it knows that the
> file is an exact multiple of a page size, it should copy it into a
> malloc’d buffer that is one byte larger, and put a zero there.

This doesn't seem to be working correctly, unless
sys::Process::getPageSize() is returning the wrong thing for me.

I'm getting here from getOpenFileImpl() in MemoryBuffer.cpp, which is
deciding it's safe to mmap (by calling shouldUseMMap, where the page
size check you mention is). The file is 561152 bytes, and the page
size reports 16384, so this is not a multiple of page size (0x89000 &
0x3fff == 0x1000). Yet, the memory one byte past the mmap is not

>> There are three options here:
>> 1. Remove the assert - it isn't possible to write correctly.
> This will break clang.  Clang “knows” that files end with a nul byte.

Strictly speaking, it won't "break" clang, it will just make a certain
class of bugs harder to track down.

>> 2. Remove the RequiresNullTerminator argument from MemoryBufferMMapFile
>>   and hard code it to false.
>> 3. Allocate an extra byte here and zero it.
> #3 seems right to me assuming the problematic client is clang.  If the
> problematic client isn’t clang, then it should mirror whatever clang
> is doing with this API.  If Clang isn’t using this (which seems to be
> the case on a cursory look?), then we should change this
> implementation to respect memorybuffer’s invariants IMO.

This isn't clang, but the problem is happening entirely within LLVM -
I'm getting here from a call to MemoryBuffer::getFileOrSTDIN(SomeFile).
I can work around the problem by updating my client to pass
RequiresNullTerminator=false, but that doesn't actually fix the problem
in LLVM itself.

More information about the llvm-dev mailing list