[llvm-dev] UB in MemoryBufferMMapFile

Chris Lattner via llvm-dev llvm-dev at lists.llvm.org
Wed Nov 16 22:27:00 PST 2016


On Nov 16, 2016, at 9:46 PM, Justin Bogner via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> In MemoryBuffer::init, we have an assert that reads the memory at
> `BufEnd`, which is one past the end of some memory region:
> 
> from lib/Support/MemoryBuffer.cpp:45:
>> void MemoryBuffer::init(const char *BufStart, const char *BufEnd,
>>                        bool RequiresNullTerminator) {
>>  assert((!RequiresNullTerminator || BufEnd[0] == 0) &&
>>         "Buffer is not null terminated!");
>>  BufferStart = BufStart;
>>  BufferEnd = BufEnd;
>> }

Commenting from a historical perspective, this is as intended.  In the case of an mmapped buffer, you usually mmap some file that is not a multiple of a page size, and the OS zero fills the rest of the page.  This makes this behavior free.  In the case where you’re not mmaping a file (e.g. for clang, if the file is small) it is easy enough to malloc one byte more than you need to ensure the zero byte is present.

> However, this can be, and often is, one past an allocated region, since
> in MemoryBufferMMapFile we mmap `Len` bytes and then call this with
> `BufEnd = Start + Len`:
> 
> from lib/Support/MemoryBuffer.cpp:210:
>>  MemoryBufferMMapFile(bool RequiresNullTerminator, int FD, uint64_t Len,
>>                       uint64_t Offset, std::error_code &EC)
>>      : MFR(FD, sys::fs::mapped_file_region::readonly,
>>            getLegalMapSize(Len, Offset), getLegalMapOffset(Offset), EC) {
>>    if (!EC) {
>>      const char *Start = getStart(Len, Offset);
>>      init(Start, Start + Len, RequiresNullTerminator);
>>    }
>>  }
> 
> This is UB, since we're reading one past an allocated region, and I have
> an internal test case where I'm seeing it crash because that memory
> happens to be unreadable.

Well that is wrong.  It should mmap the file, and if it knows that the file is an exact multiple of a page size, it should copy it into a malloc’d buffer that is one byte larger, and put a zero there.

> There are three options here:
> 
> 1. Remove the assert - it isn't possible to write correctly.

This will break clang.  Clang “knows” that files end with a nul byte.

> 2. Remove the RequiresNullTerminator argument from MemoryBufferMMapFile
>   and hard code it to false.
> 3. Allocate an extra byte here and zero it.

#3 seems right to me assuming the problematic client is clang.  If the problematic client isn’t clang, then it should mirror whatever clang is doing with this API.  If Clang isn’t using this (which seems to be the case on a cursory look?), then we should change this implementation to respect memorybuffer’s invariants IMO.

-Chris


More information about the llvm-dev mailing list