[PATCH] D33964: [LLVM][llvm-objcopy] Added basic plumbing to get things started

Fri Jul 14 01:31:01 PDT 2017

jhenderson added a comment.

Thanks for adding empty-section.s. Once you have program header support in yaml2obj, I think it would be good to have the same test for empty sections in program headers, for similar reasons (actually, the way the code was written, I think that test would be the more important one).

================
Comment at: tools/llvm-objcopy/Object.cpp:79-80
+
+// Returns true IFF a section covers a segment
+static bool sectionCoversSegment(const SectionBase &Section,
+                                 const Segment &Segment) {
----------------
It's not clear on the surface what is meant by "Covers" (I assume it means within or similar). Also, I think you've got it backwards in the name - it returns true, if the section is in the segment, not the other way around.

Also, I'm not sure what the behaviour of empty sections should be here. My instinct is to say that empty sections are "covered" by a segment, if they appear at the start, or in the middle of a segment, but not if it is only at the end. I could imagine a case with two segments with the first having the same end offset as the start of the next one. If there is an empty section at the meeting point, I don't think it makes sense for it to be in both. From a run-time point of view, I don't think it matters, but it could cause weird looking anomalies in static dumping tools, and also leads to the case of, "if the first segment moves to a lower offset, does the section move with it or stay with the other segment?"

================
Comment at: tools/llvm-objcopy/Object.cpp:193
+      // image is a copy of the file image, we preserve the file image as well.
+      Offset = FirstInSeg->Offset + Section->OriginalOffset -
+               FirstInSeg->OriginalOffset;
----------------
jakehehrlich wrote:
> jhenderson wrote:
> > I've just thought about yet another problem with this:
> > 
> > In some cases, the ELF header and/or Program Header table can appear in a segment before the first section. Consequently, we need to preserve the relative offset to the start of the section, rather than the first segment. At the moment, any space before the first section within the segment is ignored and lost.
> > 
> > This applies to Segment::finalize() too.
> Yea PT_PHDR is a problem I have run into in extensions of this diff. I think it needs some special handling. I think I can support it when I officially add support for copying data of interstitial gaps and nested segments. It will still require special care however. Would you be ok with me not attempting to handle this right now? At the very least this requires handling nested segments because the PT_PHDR segment must be nested in a PT_LOAD segment.
> 
> The needed information is preserved now because the original offset of the segment is stored until the segment offset needs to be computed and the original offset of the section is now stored and kept. So that gap can be recovered when performing layout of the segment. This seems to be enough to confirm that there isn't a fundamental design issue that prevents the PT_PHDR case from being handled.
I'm not sure it's specific to PT_PHDR - it's possible to put these headers in any arbitrary segment, if you want (though I'm not sure why you would in most cases), e.g. directly in the first PT_LOAD segment, with the appropriate linker script. Also, I don't think it's specific to the headers either - the file image is going to get messed up if there was empty space of any kind not covered by a section before the first section. Rather than preserve the relative offsets of sections to the first section, this code should probably preserve the offset relative to the original segment start.

Example:
```
| Segment                   |
| gap | Section1 | Section2 |
A     B
```
The offset of Section1 will be placed at A, if I've followed the code correctly, not at B, where it should be, thus messing up the file and therefore the memory image. Section2 will still be placed immediately after Section1 correctly.

As a side point, this will also have an impact on the alignment of the first section, but assuming the input program is valid, I don't think that this will be an issue.

================
Comment at: tools/llvm-objcopy/Object.cpp:261-271
+  Elf_Shdr &Shdr = *reinterpret_cast<Elf_Shdr *>(Buf);
+  Shdr.sh_name = 0;
+  Shdr.sh_type = SHT_NULL;
+  Shdr.sh_flags = 0;
+  Shdr.sh_addr = 0;
+  Shdr.sh_offset = 0;
+  Shdr.sh_size = 0;
----------------
jakehehrlich wrote:
> jhenderson wrote:
> > Nothing is using Shdr here. Is this missing a write, or can it be removed?
> This is actually correct I believe. This does point out a lack of clarity in what is going on here however. 
> 
> As is every section knows where their header belongs so I just pass "Out" to the section and it does the rest. This highlighted code writes the dummy section at the start of the section header table. So there's no missing write or anything. I'll add a comment explaining what's going on here.
Ah, I missed the fact that this was a reference rather than a copy. What you are doing here is fine then. A comment might be helpful though, as you say.

================
Comment at: tools/llvm-objcopy/Object.h:136-141
+  uint64_t Entry;
+  uint64_t SHOffset;
+  uint32_t Type;
+  uint32_t Machine;
+  uint32_t Version;
+  uint32_t Flags;
----------------
jakehehrlich wrote:
> jhenderson wrote:
> > The size of some of these fields is dependent on the Elf Type (64 versus 32). I wonder if these should be based on ELFT? On the other hand, I'm quite happy that we just use the largest size for now.
> > 
> > At the moment, since the template is only instantiated for Elf64LE, this is fine, but it won't work once 32-bit support is added, unless the point where they are assigned to the Elf Header struct is modified to cast the values accordingly.
> This was a recommendation that I got from Rui actually. To quote Rui when speaking about LLD.
> 
> > - Integral types such as Elf{32,64}_{XWord,Word,Addr,Offset} are not useful and better to avoid. We are using uint{8,16,32,64}_t instead. It seems it improves readability. (I honestly don't memorize > the real types of these ELFT types.)
> > - ELFT::uint, whose size is 32/64 depending on ELF32/64, isn't useful. Always uint64_t to represent a value that can be 32 or 64. The waste of doing this is negligible, but it could drastically simplify > types because if your function uses only ELFT::uint, you can de-template that function by using uint64_t.
> 
> I've found this to be true as well. It definitely lets you get rid of a lot of templates. If we just used the templates when you had access to them (Like in Object here) and used the largest type when we didn't (like in Segment or SectionBase) we would get this strange inconsistency in our code. I'd like to avoid that.
Ok, that's fine with me, as long as it was a conscious decision. I could see a case in the future when adding support for things like adding or modifying sections when it may be important to safety check that we haven't overflowed, but that's not relevant at the moment.

Repository:
  rL LLVM

https://reviews.llvm.org/D33964