[PATCH] D91693: [Support] Add reserve() method to the raw_ostream.

Fri Nov 27 00:52:00 PST 2020

jhenderson added a comment.

In D91693#2416252 <https://reviews.llvm.org/D91693#2416252>, @avl wrote:

> @jhenderson Summary of proposed approach:
>
> 1. Current solution, which uses Buffer/MemoryBuffer/FileBuffer assumes that the buffer for the whole file should be pre-allocated. This assumes that the whole file should be loaded into the memory. For the library, it might be an inconvenient requirement. Some tools might want to have a possibility to reduce memory usage.
>
>   Using streams allows us to reduce memory usages. No need to load all data into the memory - the data could be streamed through a smaller buffer.

It's not immediately clear to me which data you're concerned about loading into memory, but it's worth remembering that objcopy will need to have all the data (or at least its size) in its internal representation before it can write most bits, at least for ELF objcopy, because the section header offset field is part of the ELF header, but the section header table itself is at the end.

> 2. Passing file name for the executeObjcopyOnBinary might be inconvenient if we would like to have a possibility to replace destination.

This can be solved by overloading:

  // In all examples, types of first two parameters are arbitrarily chosen, the third parameter, when present, is the important one).

  // Execute objcopy then write to file Out.
  Error executeObjcopyOnBinary(const CopyConfig &Config, const Binary &In, StringRef Out) {
    // Do some objcopy stuff to finalise the objcopy layout.
    Object Obj = executeObjcopy(Config, In);
    size_t Size = Obj.getOutputSize();
    raw_mmap_ostream OS(Out, Size);
    writeOutput(OS, Obj);
  }

  // Execute objcopy then write to raw_ostream Out. raw_mmap_ostream won't be usable here, since the output size isn't known up front.
  Error executeObjcopyOnBinary(const CopyConfig &Config, const Binary &In, raw_ostream &Out) {
    Object Obj = executeObjcopy(Config, In);
    writeOutput(Out, Obj);
  }

  // If a user wanted to write the output to an existing raw_mmap_ostream, they'd need to call the lower-level API directly:
  void aFunction(...) {
    ...
    Object Obj = executeObjcopy(Config, In);
    size_t Size = Obj.getOutputSize();
    ...
    raw_mmap_ostream Out(FileName, Size + ...);
    ...
    writeOutput(Out, Obj);
  }

It's worth noting that we cannot have `Expected<raw_ostream &> executeObjcopyOnBinary(const CopyConfig &Config, const Binary &In);` where the function itself creates the stream and returns it, because something needs to own the backing storage.

> 3. Current objcopy implementation assumes that data is already pre-allocated. Thus, to have the advantage of using streams it should be rewritten in such a way that streams were used directly(without creating intermediate buffers).

I don't think it's possible to start writing to a stream before the size of the data is known (see above). Does that mean streams are still justified?

> 4. If llvm-objcopy needs to use memory-mapped files then we might implement resizable raw_mmap_ostream. Though I do not know at the current moment whether the effective implementation of resizable memory-mapped file could be done.

I don't think resizeable memory-mapped files are useful in this context, because the size needs to be known before writing anyway. The exception is if we think calling the objcopy guts is too complicated for users that want to write to an existing memory mapped file, but I think it is possible to keep it simple, like in my example above.

> 5. The fact that llvm-objcopy knows the size of resulting data could be used to make implementation a bit more efficient. We could tell reserve() for the streams and the more effective buffers could be used:

This is merely a nice-to-have, and not itself directly related to the issue at hand. It can be implemented later. People who want the efficiency bonus of reserving could use the same approach as I did in the above example with the adding to an existing memory mapped file, by calling the lower-level API directly. They will usually have access to their underlying backing storage anyway (e.g. the `std::vector`) and so can just reserve it directly.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91693/new/

https://reviews.llvm.org/D91693