[PATCH] D62364: llvm-objcopy: Implement --extract-partition and --extract-main-partition.

Wed May 29 14:58:43 PDT 2019

pcc marked an inline comment as done.
pcc added a comment.

In D62364#1521845 <https://reviews.llvm.org/D62364#1521845>, @jakehehrlich wrote:

> A few questions
>
> 1. This direction means that we can only ever convert from a multi partition binary to a single partition binary. It seems like we should be able to remove each partition individually. The naming scheme I would imagine would be "--remove-partition=<name>" or "--remove-main-partition" and for the equivalent of this feature "--only-keep-partition=<name" and "--only-keep-main-partition". Are you sure we want to limit things to work like this?
> 2. Can you explain the details a bit more? Like it appears that we have a section to point to partition headers but I'd expect this to be available at dynamic link time as well which should mean that there should be a program header based solution to this as well. I'm also trying to understand exactly what the resulting file format looks like and what effect other operations would have on the net result. Lots of things are not clear to me from the documentation posted.

The combined output file (i.e. the file produced by the linker) is not intended to be loaded directly. It is a description of the program with all partitions loaded. As written in the doc, it is essentially an ELF file with all of the partitions concatenated together. Further postprocessing (by llvm-objcopy) is required to produce a loadable file.

In a combined output file, each partition has its own set of data contained within segments (i.e. ELF header, program headers and SHF_ALLOC sections), while data not contained within segments (e.g. debug info, section headers) is shared between partitions. To extract a loadable partition, the partition's ELF headers are found by searching the section table for an SHT_PART_EHDR section named after the partition. From there, the program headers are found, which serve as a description of which SHF_ALLOC sections pertain to the partition. Thus the entire description of each partition can be found starting from the section headers.

This is not unlike other types of ELF files that describe (parts of) a program without being directly loadable. Two other examples are core dumps and `--only-keep-debug` files.

Because the combined output file is not intended to be loaded directly, there is no need for a description of the loadable partitions in any other partition's program headers. When llvm-objcopy extracts a partition, the information required to load that partition, i.e. its ELF and program headers, end up moved into place at the front of the file so that they can be read by the dynamic loader. You can think of what llvm-objcopy is doing when it extracts a partition as taking a slice of the combined output file from the start of the partition's ELF headers until the start of the next partition (or the end of the file), except that the data not contained within segments is preserved where it makes sense (i.e. debug info, section headers pertaining to preserved sections).

I'm having trouble seeing a use case for `--remove-partition` or `--remove-main-partition`, and it doesn't really fit the pattern of "make this file loadable" of the other flags, so I'd be hesitant to implement it.

================
Comment at: llvm/tools/llvm-objcopy/ELF/ELFObjcopy.cpp:734
                              object::ELFObjectFileBase &In, Buffer &Out) {
-  ELFReader Reader(&In);
+  ELFReader Reader(&In, Config);
   std::unique_ptr<Object> Obj = Reader.create();
----------------
jakehehrlich wrote:
> Adding the CopyConfig here is antithetical to the goal of making this code into a library at some point in the future. We should find a way to avoid this. This goes for propagation of Config elsewhere in the code as well. Propogation of CopyConfig is a hard no from me personally.
In an earlier version of the code [1] I was just passing down the name of the partition to extract. Would that work better (maybe with the name passed as a ctor argument rather than as a field)? Alternatively, we could create something like a ReaderConfig to be passed down here that for the moment would just contain the partition name.

[1] https://github.com/pcc/llvm-project/blob/c9939239d7e829e2c779138aaedea061f4a095d0/llvm/tools/llvm-objcopy/ELF/ELFObjcopy.cpp#L597

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D62364/new/

https://reviews.llvm.org/D62364