[llvm-dev] Refactoring. Using streams for llvm-objcopy.

Alexey Lapshin via llvm-dev llvm-dev at lists.llvm.org
Thu Feb 11 01:53:04 PST 2021


David, Thank you for the opinion. What might be the next step to make a 
progress with it?

Do I need to initiate proposal review, like described here - 
https://github.com/llvm/llvm-www/blob/main/proposals/LP0001-LLVMDecisionMaking.md 
?

Thank you, Alexey.

On 20.01.2021 01:42, David Blaikie wrote:
> Sure, streaming sounds fine to me. The more this looks like the code
> we already have for writing object files in LLVM, the better,
> hopefully (though that code is a bit expensive, last I played with it
> - MCStreamer would buffer each output section, etc, before writing the
> output - so some lower-level abstraction below that which I think
> we're missing - would be great to have something below MCStreamer that
> was higher level than the raw_pwrite_stream it uses - that could be
> shared with lld, objcopy, dwp, dsymutil, etc).
>
> On Mon, Jan 18, 2021 at 9:13 AM Alexey Lapshin <avl.lapshin at gmail.com> wrote:
>> Folks,
>>
>> we are trying to reuse some part of llvm-objcopy.
>> To make it possible we want to move the main implementation of llvm-objcopy
>> from "tools" subdirectory into the "Object" library(D88827).
>> One of the problems is using custom buffer
>> class(tools/llvm-objcopy/Buffer.h/cpp)
>> as an output buffer by llvm-objcopy:
>>
>> Error executeObjcopyOnRawBinary(const CopyConfig &Config, MemoryBuffer
>> &In, ***Buffer &Out***);
>>
>> class Buffer {
>>     StringRef Name;
>>
>>     virtual Error allocate(size_t Size) = 0;
>>     virtual uint8_t *getBufferStart() = 0;
>>     virtual Error commit() = 0;
>> }
>>
>> There are two drawbacks of using custom Buffer class:
>>
>> 1. It is not good to move the custom Buffer class into the general llvm
>> Object library.
>>       It is better to use some standard, already existed solution.
>>
>> 2. Interface of class Buffer assumes that the entire buffer must be
>> preallocated.
>>       i.e. before writing to the output file we need to pre-allocate the
>> space.
>>       Such pre-allocation is not a problem if memory-mapped files are
>> used behind Buffer.
>>       But it could be a wasting of memory resources in other cases.
>>       A library might be used in a wider number of scenarios than a
>> separate tool.
>>       So it would not be good for the library to work effectively only if
>> memory-mapped files are used.
>>
>> =======================================
>>
>> We propose to use streams instead of custom Buffer(D91028):
>>
>> Error executeObjcopyOnRawBinary(const CopyConfig &Config, MemoryBuffer
>> &In, ***raw_ostream &Out***);
>>
>> That solution has the following benefits:
>>
>> 1. it uses standard llvm streams.
>> 2. it does not require pre-allocating of the entire space.
>> 3. it allows easily replace kind of destinations(raw_fd_ostream,
>> raw_null_ostream,
>>       raw_svector_ostream, raw_sha1_ostream, raw_string_ostream).
>> 4. for some usages it could avoid memory allocations at all(using
>> raw_sha1_ostream
>>       as a destination for sha calculation would not require to allocate
>> space for the output file).
>>
>> That solution has the following drawbacks:
>>
>> 1. There is not memory-mapped file implementation for streams.
>> 2. Some formats could not be generated through one pass.
>>       f.e. the ELF format needs to go back(after the section header table
>> is generated
>>       it needs to go back and update the ELF header).
>>
>> For the first point, we might create such an
>> implementation(raw_mmap_stream).
>>
>> For the second point, it looks like we have three alternatives:
>>
>> a) Current implementation of ELF writer already has preliminary steps
>> which calculate sizes.
>>        Before allocating destination buffer, it calculates the size of
>> the resulting binary:  ELFWriter<ELFT>::finalize().
>>        So it looks like all required ELF header information might be
>> precalculated during this finalizing step.
>>        It allows writing data to the output stream by one pass.
>>
>> b) use raw_pwrite_stream as the output. It would allow seeking and
>> updating.
>>
>> c) use internal memory buffer, generate the file into that memory
>> buffer(memory buffer allows to
>> go back and update) and then stream that buffer into the output.
>>
>> =======================================
>>
>> D91028 suggests the following roadmap to replace Buffer with streams:
>>
>>
>> 1. Implement interfaces using raw_ostream:
>>
>>      Error executeObjcopyOnBinary(CopyConfig &Config, object::Binary &In,
>> raw_ostream &Out);
>>
>> 2. Use additional internal buffers for file generation and not change
>> the writer's implementation.
>>
>>       After the files are generated stream buffers into the
>> output(raw_ostream &Out).
>>
>>      Error executeObjcopyOnBinary(CopyConfig &Config, object::Binary &In,
>> raw_ostream &Out) {
>>        MemoryBuffer internal;
>>        ELFWriter.write(internal);
>>        Out.write(internal.data(), internal.size());
>>      }
>>
>>
>> 3. Change the implementation of writers(ELF/COFF/MachO/Wasm) to not use
>> internal buffers.
>>       So that writers store data into the output stream directly.
>>
>>      Error executeObjcopyOnBinary(CopyConfig &Config, object::Binary &In,
>> raw_ostream &Out) {
>>        ELFWriter.write(Out);
>>      }
>>
>>       If all implementations are successful - then leave raw_ostream in
>> interfaces.
>>
>>       If some implementations would still require seek/update
>> functionality then change raw_ostream into raw_pwrite_stream:
>>
>>       Error executeObjcopyOnBinary(CopyConfig &Config, object::Binary
>> &In, raw_pwrite_stream &Out);
>>
>> =======================================
>>
>>
>> So, what do you think? Would it be good to use streams as an output
>> format for objcopy code in Object library?
>>
>> Or Do we need to use some other solution here?
>>
>> Thank you, Alexey.
>>


More information about the llvm-dev mailing list