[llvm-dev] Refactoring. Using streams for llvm-objcopy.
Alexey Lapshin via llvm-dev
llvm-dev at lists.llvm.org
Mon Jan 18 09:13:28 PST 2021
Folks,
we are trying to reuse some part of llvm-objcopy.
To make it possible we want to move the main implementation of llvm-objcopy
from "tools" subdirectory into the "Object" library(D88827).
One of the problems is using custom buffer
class(tools/llvm-objcopy/Buffer.h/cpp)
as an output buffer by llvm-objcopy:
Error executeObjcopyOnRawBinary(const CopyConfig &Config, MemoryBuffer
&In, ***Buffer &Out***);
class Buffer {
StringRef Name;
virtual Error allocate(size_t Size) = 0;
virtual uint8_t *getBufferStart() = 0;
virtual Error commit() = 0;
}
There are two drawbacks of using custom Buffer class:
1. It is not good to move the custom Buffer class into the general llvm
Object library.
It is better to use some standard, already existed solution.
2. Interface of class Buffer assumes that the entire buffer must be
preallocated.
i.e. before writing to the output file we need to pre-allocate the
space.
Such pre-allocation is not a problem if memory-mapped files are
used behind Buffer.
But it could be a wasting of memory resources in other cases.
A library might be used in a wider number of scenarios than a
separate tool.
So it would not be good for the library to work effectively only if
memory-mapped files are used.
=======================================
We propose to use streams instead of custom Buffer(D91028):
Error executeObjcopyOnRawBinary(const CopyConfig &Config, MemoryBuffer
&In, ***raw_ostream &Out***);
That solution has the following benefits:
1. it uses standard llvm streams.
2. it does not require pre-allocating of the entire space.
3. it allows easily replace kind of destinations(raw_fd_ostream,
raw_null_ostream,
raw_svector_ostream, raw_sha1_ostream, raw_string_ostream).
4. for some usages it could avoid memory allocations at all(using
raw_sha1_ostream
as a destination for sha calculation would not require to allocate
space for the output file).
That solution has the following drawbacks:
1. There is not memory-mapped file implementation for streams.
2. Some formats could not be generated through one pass.
f.e. the ELF format needs to go back(after the section header table
is generated
it needs to go back and update the ELF header).
For the first point, we might create such an
implementation(raw_mmap_stream).
For the second point, it looks like we have three alternatives:
a) Current implementation of ELF writer already has preliminary steps
which calculate sizes.
Before allocating destination buffer, it calculates the size of
the resulting binary: ELFWriter<ELFT>::finalize().
So it looks like all required ELF header information might be
precalculated during this finalizing step.
It allows writing data to the output stream by one pass.
b) use raw_pwrite_stream as the output. It would allow seeking and
updating.
c) use internal memory buffer, generate the file into that memory
buffer(memory buffer allows to
go back and update) and then stream that buffer into the output.
=======================================
D91028 suggests the following roadmap to replace Buffer with streams:
1. Implement interfaces using raw_ostream:
Error executeObjcopyOnBinary(CopyConfig &Config, object::Binary &In,
raw_ostream &Out);
2. Use additional internal buffers for file generation and not change
the writer's implementation.
After the files are generated stream buffers into the
output(raw_ostream &Out).
Error executeObjcopyOnBinary(CopyConfig &Config, object::Binary &In,
raw_ostream &Out) {
MemoryBuffer internal;
ELFWriter.write(internal);
Out.write(internal.data(), internal.size());
}
3. Change the implementation of writers(ELF/COFF/MachO/Wasm) to not use
internal buffers.
So that writers store data into the output stream directly.
Error executeObjcopyOnBinary(CopyConfig &Config, object::Binary &In,
raw_ostream &Out) {
ELFWriter.write(Out);
}
If all implementations are successful - then leave raw_ostream in
interfaces.
If some implementations would still require seek/update
functionality then change raw_ostream into raw_pwrite_stream:
Error executeObjcopyOnBinary(CopyConfig &Config, object::Binary
&In, raw_pwrite_stream &Out);
=======================================
So, what do you think? Would it be good to use streams as an output
format for objcopy code in Object library?
Or Do we need to use some other solution here?
Thank you, Alexey.
More information about the llvm-dev
mailing list