[PATCH] D91028: [llvm-objcopy][NFC] replace class Buffer/MemBuffer/FileBuffer with streams.

Wed Jan 27 07:30:06 PST 2021

avl added a comment.

> Is there some fundamental reason why you can't determine the final output size? In lld (wasm-ld at least) we go to some lengths to determine the final size before writing, basically for this reason.

I do not know whether exists some fundamental reason for not determining output size in the above case.
My comments based on the existing code:

  Expected<std::unique_ptr<MemoryBuffer>> B =
      writeUniversalBinaryToBuffer(Slices);
  if (!B)
    return B.takeError();
  if (Error E = Out.allocate((*B)->getBufferSize()))
    return E;
  memcpy(Out.getBufferStart(), (*B)->getBufferStart(), (*B)->getBufferSize());
  return Out.commit();

Thus, I assumed that in that case, it is hard to calculate the final size.
Probably, the reason why the data is copied here is just not presenting class "Buffer" in the Support or Object libraries(So it could not be used inside the writeUniversalBinaryToBuffer()).

In the general case, the calculation of the final size might be the same expensive as the generation of the output file. 
So it might be ineffective to pre-calculate the size.
That is why I am trying to suggest an interface that would not require buffer pre-allocation.

> IIUC using a memory mapped output file has some big wins.

My suggestion is not about avoiding memory-mapped file usages. 
I suggest an interface that allows to select/change the most useful data holder. 
f.e. If we need to calculate SHA on the output of llvm-objcopy we would need 
to store data in some temporary location and then do calculations:

  MemBuffer/FileBuffer Out;
  executeObjcopyOnBinary(Config, In, Out);
  calculateSha(Out);

With the stream approach we do not need to allocate any temporary data at all:

  raw_sha1_ostream Out;  <<< calculates SHA, do not allocate any data.
  executeObjcopyOnBinary(Config, In, Out);

How memory-mapped files could be used with streams:

1. Simpler, but less clear/universal solution:

  class raw_ostream {
    virtual void setMaxSize(uint64_t Size);
  };

  class raw_fd_ostream {
    void setMaxSize(uint64_t Size) override {
       allocate memory mapped file and use it for following write operations..
    }
  };

  usage:
  raw_fd_ostream Out;
  executeObjcopyOnBinary (Config, In, Out) {
    Out.setMaxSize(XX);   /// <<< allocate memory mapped file. 
    ELFWriter.write(Out);
  }

That solution is implemented in D91693 <https://reviews.llvm.org/D91693>.

2. More complex, but the better-designed solution(suggested in https://reviews.llvm.org/D91693#2407437):

  implement resizable mmap_ostream. mmap_ostream would allocate memory-mapped chunks.  If it needs to store more data it would allocate new chunks. Finally, it closes and concatenates all chunks into a single file. if implementation uses reserve() method to inform mmap_ostream which size to preallocate then mmap_ostream would have the same performance as current FileOutputBuffer.

That solution allows to:

1. select the most useful data holder.
2. to use size precalculations only when it is cheap.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91028/new/

https://reviews.llvm.org/D91028