[PATCH] D144009: [obj2yaml] Save offset for segments and size for PHDR

Fri Feb 17 03:27:57 PST 2023

treapster added inline comments.

================
Comment at: llvm/tools/obj2yaml/elf2yaml.cpp:488-493
+    PH.Offset = Phdr.p_offset;
+
+    if (Phdr.p_type == ELF::PT_PHDR) {
+      PH.MemSize = Phdr.p_memsz;
+      PH.FileSize = Phdr.p_filesz;
+    }
----------------
jhenderson wrote:
> treapster wrote:
> > jhenderson wrote:
> > > I think we should be emitting these values in the output only if they can't be derived automatically from the existing layout, otherwise the YAML output will end up unnecessarily verbose. It's been a while since I've looked at the obj2yaml code though, so it's possible this happens elsewhere.
> > I don't think one line per segment is so verbose that we should build separate logic to determine whether to include it. What makes yaml verbose are symbols, relocations, dynamic, note sections and etc. TBH i think we should also include section offsets unconditionally because people strip sections all the time which breaks all this smart offset derivation and rebuilt binaries have absolutely nonsensical offset/address values, which makes it difficult to make some tools(e.g BOLT) more strict to the input binaries, since all the generated tests will fail.  
> > But saving section offsets in all cases requires changing about 40 tests(that much failed when i just started always including section offsets), so unfortunately i won't do it anytime soon.
> > I don't think one line per segment is so verbose that we should build separate logic to determine whether to include it.
> 
> My reasoning here is that a significant use-case I'm aware of is for people creating test cases that use yaml2obj to generate the test input at runtime. Essentially, they get an object they want to test using the compiler/assembler/etc and then use obj2yaml to generate the YAML for use in the test. If the fields are emitted unnecessarily this causes two problems: 1) the YAML becomes more verbose than necessary, obfuscating what is actually important in the test and 2) it makes it harder to manipulate the YAML to add/remove sections from it (because the offsets etc all become invalid and need updating by hand). Thinking about it, I'm more concerned by the second of these.
> 
> > people strip sections all the time which breaks all this smart offset derivation and rebuilt binaries have absolutely nonsensical offset/address values
> 
> Sorry, I'm not sure what you're saying here? Are you saying people use tools like llvm-strip/llvm-objcopy to manipulate sections? If so, the offsets and addresess of any remaining sections and segments should remain valid afterwards (and if they don't, that's a bug in llvm-strip/llvm-objcopy etc). Alternatively, are you saying people hand-edit the YAML after obj2yaml emitted it? If it's the latter, emitting more offsets can cause other issues.
> Sorry, I'm not sure what you're saying here? 

I'm referring to the second case where we remove sections from the generated yaml, and thus other sections move forward and have invalidated addresses. So i think we should preserve as much info in yaml as possible, because it will likely be edited.

>  If the fields are emitted unnecessarily this causes two problems: 1) the YAML becomes more verbose than necessary, obfuscating what is actually important in the test.

 As i said before, the bulk of yaml comes from other things, one line per segment or even per section will not make the file significantly bigger or harder to understand.

> 2) it makes it harder to manipulate the YAML to add/remove sections from it (because the offsets etc all become invalid and need updating by hand). 

That is exactly what happens **without** emitting offsets: you remove a section, and the next slides into it's place,  and now has invalid base address. Or, for example, you remove a first section in the segment, and the segment offset changes if not emitted, but since the VAddr is the same, we have invalid base address again.

If we emit all segments and sections offsets, we make it easier to remove some sections without invalidating others. As for adding anything, you'll have to think more about a place in a file and in address space, which is a good thing because valid mappings are a good thing. But from what i see we usually remove sections from yaml, not add them, so you won't think about it very often.

But anyway, this diff is about segment offsets, and emitting them alone will help because obj->yaml->obj becomes more accurate.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144009/new/

https://reviews.llvm.org/D144009