[PATCH] D42872: Fix handling of zero-size segments in llvm-objcopy

Jake Ehrlich via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Feb 13 15:23:31 PST 2018


jakehehrlich added a comment.

I've answered these questions very out of order, I'm not really sure what order to answer them in however so I just answered them as I got to them. Also sorry about not properly explaining the reasoning behind all of this. Feel free to ask anymore questions. I've tried to provide an explanation for any confusion that might linger.

> Regarding PT_PHDR. I set the types of both added segments to PT_PHDR, because PT_LOAD segments must not overlap each other and other types fit even less. Neither 2, nor 3 PT_PHDR segments are allowed by the spec, but for now it should be fine.

Yeah I couldn't really figure out what they should be. I think your choice is fine. I thought for a moment to ask you to make them PT_NULL but it dosn't really matter because they don't get written out.

> This puts us to an uneasy situation, where we definitely know that the file starts with an ELF header, which is likely padded to the page size at least (if we assume that all the ELF files follow the optimization suggestions from the spec), but other than that we can hardly make any assumptions.

Prior to this change we incorrectly assumed that the program headers would occur directly after the ELF header. In practice this was a fair assumption but it leads to incorrect behavior if the program headers and elf header are covered by a program header. It's wrong and absolutely should be fixed. By having the ELF header and program headers added as extra segments that are also laid out we'll take care of that.

> but why would your code be happy with a random segment that is not contiguous to the other segments and possibly overlaps with the existing segments.

Lots of segment types are always are overlapped. The mechanism by which this is handled is the ParentSegment member. The idea is that if segment A overlaps segment B then you can use segment A's previously assigned Offset to assign B's offset. If you carefully order the segments you can ensure that any segment with a parent segment has already had an Offset assigned to it. In fact there's an important behavoir that this change adds that wasn't handled before that this behavior needs to support. Say you have a PT_LOAD segment that starts at offset zero and a PT_PHDR segment that starts at 0x80. Normally the PT_PHDR segment would start at 0x40 but lets say for whatever reason the linker put it at 0x80. That would mean that code would assume that the program headers were at a location that they were in fact not in the way the current llvm-objcopy works. Your change should fix that however because the (now added extra program headers section) will be overlapped by the PT_LOAD segment, that PT_LOAD segment will be the ParentSegment of ProgramHdrSegment and then ProgramHdrSegment will be laid out correctly now.

Regarding finding these things out. It's a combination of reading, rereading, and seeing many crazy examples that technically meet the spec. I have the benefit of having a number of people I can talk to that have a deep knowledge of the spec. Sometimes you just have to ask unfortunately. On this particular point the spec never says segments can't overlap, it does say that sections can't overlap however.

> I assumed that it won't work this way, and thus wrote the code to account for padding and stuff.

Totally fair. It turns out that not only is this handled but handling it any other way is likely error prone.

> By the way, does llvm-objcopy support big endian ELFs on little endian host?

Yep, its 100% cross platform and it's behavior doesn't change from system to system. This is actually one of the primary use cases (the other is the fact llvm-objcopy isn't GPL). Chromium uses llvm-objcopy in a few cases instead of objcopy exclusively for this reason (well to support aarch64 from an x86 machine). In a change that's coming up I'll actually be allowing changes between big and little endian (not that that will ever be useful).

> The spec does not seem to define loadable segment alignment to be equal to the page size but only recommends doing that

This is correct. Actually most loaders have this as a requirement because of the way hardware actully works. In theory there are valid PT_LOAD segments for x86 with less than 4k page size despite the fact that such hardware never existed. There are some technically valid ELF files that are not actually loadable. In llvm-objcopy we have to make the assumption that the linker already made all the right choices for us.

> I personally saw files with different loadable segment alignment values, in particular some segments had 4K and some other segments had 16K. I have never seen files where loadable segments had p_align less than page size, so for now I made an assumption that the lowest p_align value in PT_LOAD segments that is not 0 or 1, is no less than the page size.

This is totally valid. Sometimes the system page size is larger than the hardware page size. Also keep in mind different systems might have different page sizes theoretically and sometimes they might be less than 4k even. You shouldn't need to infer the page size at all however.



================
Comment at: tools/llvm-objcopy/Object.cpp:38
   uint8_t *Buf = BufPtr->getBufferStart();
   Buf += sizeof(Elf_Ehdr) + Seg.Index * sizeof(Elf_Phdr);
   Elf_Phdr &Phdr = *reinterpret_cast<Elf_Phdr *>(Buf);
----------------
I made an assumption here about where the program headers would actually go that was wrong. Instead of sizeof(Elf_Ehdr) this needs to be derived from ProgramHdrSegment to get the correct offset.


================
Comment at: tools/llvm-objcopy/Object.cpp:509
   for (auto &Child : Obj.segments()) {
     for (auto &Parent : Obj.segments()) {
       // Every segment will overlap with itself but we don't want a segment to
----------------
The ParentSegments of ElfHdrSegment and ProgramHdrSegment still need to be set so that their layout happens properly.


================
Comment at: tools/llvm-objcopy/Object.cpp:763
   Ehdr.e_entry = Obj.Entry;
   Ehdr.e_phoff = sizeof(Elf_Ehdr);
   Ehdr.e_flags = Obj.Flags;
----------------
Whoops! I thought I added this comment but apparently not. This should get it's value from ProgramHdrSegment. The reason is to support the case where the program headers are overlapped by a PT_LOAD segment and the program headers aren't directly after the elf header. This doesn't really happen in practice which is why I've gotten this far without fixing it but now that you've solved that problem we should handle it correctly.


================
Comment at: tools/llvm-objcopy/Object.cpp:944
   // case.
   uint64_t Offset;
   if (!OrderedSegments.empty()) {
----------------
Now that we always know we have a segment with Offset zero we should just set Offset to zero and remove the if-else statement below (which was a hack on my part).


Repository:
  rL LLVM

https://reviews.llvm.org/D42872





More information about the llvm-commits mailing list