[PATCH] D91028: [llvm-objcopy][NFC] replace class Buffer/MemBuffer/FileBuffer with streams.

Alexey Lapshin via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Dec 18 06:08:47 PST 2020


avl added a comment.



> Aren't many MemoryBuffers backed by memory mapped files, which aren't necessarily causing real memory usage? Could that approach be used here?

That is right. Many MemoryBuffers are backed by memory-mapped files, which aren't necessarily causing real memory usage.
But if we could not use a memory-mapped file then we do not have another way than loading the entire output file into the memory. 
Using streams allows us to have another way than loading the entire file into the memory(for the cases when we could not use memory-mapped files). 
f.e. there already exist the following code in the codebase:

  Expected<std::unique_ptr<MemoryBuffer>>
  object::writeUniversalBinaryToBuffer(ArrayRef<Slice> Slices) {
    SmallVector<char, 0> Buffer;
    raw_svector_ostream Out(Buffer);
  
    if (Error E = writeUniversalBinaryToStream(Slices, Out))  <<< entire output file is loaded into the memory
      return std::move(E);
      
    return std::make_unique<SmallVectorMemoryBuffer>(std::move(Buffer));
  }    
  ......    
    Expected<std::unique_ptr<MemoryBuffer>> B =
        writeUniversalBinaryToBuffer(Slices);
    if (!B)
      return B.takeError();
    if (Error E = Out.allocate((*B)->getBufferSize()))
      return E;
    memcpy(Out.getBufferStart(), (*B)->getBufferStart(), (*B)->getBufferSize());  
    ^^^^^^^^^
    data is copied from memory into the output buffer

We could not use the memory-mapped file for writeUniversalBinaryToStream() since we do not know the final size. 
Opposite, streams do not require knowing the final size so writeUniversalBinaryToStream() might write to the output 
directly, without intermediate buffers.

> That said, pretty sure LLVM's integrated assembler uses a pwrite stream to write out its object files, so if we can/if this patch does align with that method of output that seems OK.

That is one alternative which I suggest. We could use raw_pwrite_stream. If everybody agrees that it is correct choice I could update the patch accordingly.

But, initially, I suggested trying to use a, probably, better alternative - using raw_ostream.

> So this patch as proposed is not using a pwrite stream, it seems?

yes. the patch in the current form suggests using raw_ostream. The reason for this is that raw_ostream is more general and has fewer restrictions
(it does not require to have seek/update functionality). It is good for the library to have a more general interface.

> "if Writers could be implemented in such a way that data could be streamed without seeking/updating" - how does that work today, without the patch, and how does the patch address that need to seek?

without the patch, the data is already pre-allocated by __external__ in-memory/mmap buffer:

  llvm/tools/llvm-objcopy/ELF/Object.cpp
  
  template <class ELFT> Error ELFWriter<ELFT>::finalize() {
  ...
    if (Error E = Buf.allocate(totalSize())) <<<<<<<<<<<<<<<<<<<<<<<
    return E;
  ...

So it is possible to seek into any place and update.

with this patch, an additional pre-allocated __internal__ memory buffer is used. the data still could be randomly accessed. 
When the writing is done then the whole buffer is written into the stream. The cost of that solution - an additional buffer.

  Buf = WritableMemoryBuffer::getNewMemBuffer(totalSize());
  ...
  // write ELF output to buf.
  ...
  // TODO: Implement direct writing to the output stream
  // TODO: (without intermediate memory buffer Buf).
  Out.write(Buf->getBufferStart(), Buf->getBufferSize());
  Out.flush();  



> I don't believe it's possible to write an ELF file without either knowing at least the sizes of all the sections up-front (you might be able to know these sizes without actually buffering the entire contents of the file) or seeking. Because the header needs to either contain an offset to the section table stored at the end of the file, where it knows the offsets/lengths of all the sections, or you write the section table at the start (so the header knows the offset up-front) and the table then needs to know the offsets of all the sections up-front.

I do not know ELF format/implementation well. But current implementation of ELF writer knows the size of the resulting file _before_ any writing started
(It allocates a buffer of the proper size). Since it knows the total output size, probably, it could pre-calculate offset to the section table for ELF header before writing started?

My initial suggestion is to try to use a more general interface(raw_ostream) and change it into the more specific(raw_pwrite_stream) if it would be inconvenient to use raw_ostream:

1. Implement interfaces using raw_ostream:



  Error executeObjcopyOnBinary(CopyConfig &Config,
                               object::Binary &In,
                               raw_ostream &Out);

2. Use additional internal buffers to not to rewrite the writer's implementation.

3. Someone(probably me, but later) changes the implementation of writers(ELF/COFF/MachO/Wasm) to not use internal buffers. So that writers store data into the output stream directly.

4. If all implementations are successful - then leave raw_ostream in interfaces.

5. If some implementations still require to seek/update functionality then change raw_ostream into raw_pwrite_stream:



  Error executeObjcopyOnBinary(CopyConfig &Config,
                               object::Binary &In,
                               raw_pwrite_stream &Out);

If it is already clear that we need to use raw_pwrite_stream then it might be OK to use it in this patch from the start.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91028/new/

https://reviews.llvm.org/D91028



More information about the llvm-commits mailing list