[LLVMdev] [RFC] llvm/include/Support/OutputBuffer.h

Tue May 8 16:40:28 PDT 2012

On May 8, 2012, at 4:08 PM, Nick Kledzik <kledzik at apple.com> wrote:

> 
> On May 8, 2012, at 3:41 PM, Michael Spencer wrote:
> 
>> On Mon, May 7, 2012 at 12:56 PM, Nick Kledzik <kledzik at apple.com> wrote:
>>> For the reasons listed in my 03-May-2012 email, I am proposing a new
>>> llvm/Support class for using in writing binary files:
>>> 
>>> /// OutputBuffer - This interface provides simple way to create an in-memory
>>> /// buffer which when done will be written to a file. During the lifetime
>>> of
>>> /// these  objects, the content or existence of the specified file is
>>> undefined.
>>> /// That is, creating an OutputBuffer for a file may immediately remove the
>>> /// file.
>>> /// If the OutputBuffer is committed, the target file's content will become
>>> /// the buffer content at the time of the commit.  If the OutputBuffer is
>>> not
>>> /// committed, the file will be deleted in the OutputBuffer
>>> buffer destructor.
>>> class OutputBuffer {
>>> public:
>>>  enum Flags {
>>>    F_executable = 1, /// set the 'x' bit on the resulting file
>>>  };
>>> 
>>>  /// Factory method to create an OutputBuffer object which manages a
>>> read/write
>>>  /// buffer of the specified size. When committed, the buffer will be
>>> written
>>>  /// to the file at the specified path.
>>>  static error_code createFile(StringRef filePath, Flags flags, size_t
>>> size,
>>>                               OwningPtr<OutputBuffer> &result);
>>> 
>>> 
>>>  /// Returns a pointer to the start of the buffer.
>>>  uint8_t *bufferStart();
>>> 
>>>  /// Returns a pointer to the end of the buffer.
>>>  uint8_t *bufferEnd();
>>> 
>>>  /// Returns size of the buffer.
>>>  size_t size();
>>> 
>>>  /// Flushes the content of the buffer to its file and deallocates the
>>>  /// buffer.  If commit() is not called before this object's destructor
>>>  /// is called, the file is deleted in the destructor. The optional
>>> parameter
>>>  /// is used if it turns out you want the file size to be smaller than
>>>  /// initially requested.
>>>  void commit(int64_t newSmallerSize = -1);
>>> };
>>> 
>>> 
>>> The Flags will probable need to be extended over time to handle other
>>> clients needs.
>>> 
>>> For Unix/Darwin, my plan is to implement this by:
>>> 1) delete the file
>>> 2) create a new file with a random name in same directory
>>> 3) truncate the file to the new size
>>> 4) mmap() in the file r/w
>>> 5) On commit, unmap the file, rename() to final name
>>> 6) In destructor, if not committed, unmap, delete the randomly named file
>>> 
>>> I'll leave the windows implementation empty and let someone with windows
>>> experience do the implementation.
>>> 
>>> Comments? Suggestions?
>>> 
>>> -Nick
>>> 
>>> 
>>> On May 3, 2012, at 6:10 PM, Nick Kledzik wrote:
>>> 
>>> Existing llvm code tends to use raw_ostream for writing files.  But
>>> raw_ostream is not a good match for a linker for a couple of reasons:
>>> 
>>> 1) When the linker creates an executable, the file needs the 'x' bit set.
>>> Currently raw_fd_ostream has no way to set that.
>>> 
>>> 2) The Unix conformance suite actually has some test cases where the linker
>>> is run and the output file does exists but is not writable, or is not
>>> writable but is in a writable directory, or with funky umask values.
>>> raw_fd_ostream interface has no way to match those semantics.
>>> 
>>> 3) On darwin we have found the linker performs better if it opens the output
>>> file, truncates it to the output size, then mmaps in the file, then writes
>>> directly into that memory buffer.  This avoids the memory copy from the
>>> private buffer to the OS file system buffer in the write() syscall.
>>> 
>>> 4) In the model we are using for lld, a streaming output interface is not
>>> optimal.   Currently, lld copies chunks of code from the (read-only) input
>>> files, to a temporary buffer, then applies any fixups (relocations), then
>>> streams out that temporary buffer.  If instead we had a big output buffer,
>>> the linker could copy the code chunks directly to the output buffer and
>>> apply the fixups there, avoiding an extra copy.
>>> 
>>> 
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>> 
>> 
>> I like it in general. The only comment I have is that it should match
>> the MemoryBuffer interface where possible.
>> 
>> static error_code getFile(...);
>> 
>> const char *getBufferStart() const;
>> const char *getBufferEnd() const;
>> size_t getBufferSize() const;
>> StringRef getBuffer() const;
>> const char *getBufferIdentifier() const;
> 
> Except that MemoryBuffer is about reading which is "getting".  OutputBuffer is about writing, so the "get" part seems wrong.   That said, some of the methods could use "const".  And there should be an accessor for getting the file path.  If wanted to be C++11 crazy, we could rename the start/end methods to begin() and end(), but I doubt any will want to easily iterate the bytes in the buffer.  My proposed update is then:
> 

The "get" prefix is just LLVM nomenclature for accessor methods and isn't intended to convey information about the semantics of the object itself.

> static error_code createFile(StringRef filePath, Flags flags, size_t size,
>                                                  OwningPtr<OutputBuffer> &result);
>  uint8_t *bufferStart() const;
>  uint8_t *bufferEnd() const;
>  size_t size() const;
>  void commit(int64_t newSmallerSize = -1);
>  StringRef path() const;
> 
> -Nick
> 
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev