[PATCH] D144009: [obj2yaml] Save offset for segments and size for PHDR

Wed Feb 22 00:33:51 PST 2023

jhenderson added a comment.

In D144009#4138297 <https://reviews.llvm.org/D144009#4138297>, @treapster wrote:

> In D144009#4138216 <https://reviews.llvm.org/D144009#4138216>, @jhenderson wrote:
>
>> It seems to me like we have a fundamental difference in how obj2yaml is used (or rather, there are multiple valid use-cases, but they pull in different directions). Whether it's segment offsets or section offsets, the fundamental issue is the same, I believe (just one has a bigger impact than the other). Let me summarize my understanding of the situation and the pros/cons of the approaches:
>>
>> 1. Use-case 1: As a way of quickly generating YAML blocks for testing purposes. In this case, you don't want more than the minimum needed to represent the object, as anything extra hides what's actually important. Since yaml2obj can happily derive section/segment offsets automatically, it isn't necessary to include the offsets in this case for most cases. Furthermore, it is often not necessary to have rigid offsets/addresses (many tests don't care about these). Typically these test cases will have minimal symbols, relocations, or sections, so the signal to noise impact of unnecessary offsets (for segments or sections) is relatively high.
>> 2. Use-case 2 (I'm uncertain if this is exactly correct, but hopefully I've got the general point correct): As a way of creating something that is easier to manipulate than a raw binary. Often, this might be a linked executable. Removing sections etc is desirable, but any addresses need to remain correct (due to things like already-applied relocations referencing them). Removing sections will cause segments/sections to potentially move, unless offsets are included explicitly in the YAML, or something else is used to replace the section.
>>
>> I don't see a clear way forward that meets both sets of requirements. About the best idea I have is to change obj2yaml to emit hard-coded offsets only when needed to enforce the layout (i.e. correct implicit values should be preferred), but then users should not simply remove whole sections, and instead use Fill sections to replace them in the YAML (this is actually fairly easy to do, but users need to be aware of these Fill sections).
>
> You're right, and the second case of generating a somewhat valid executable is currently not handled very well, so adding an option to emit section offsets may be useful. But anyway, the context of this diff is purely segment offsets, and i don't see fundamental contradictions with the first case here.

I think for the Program Headers, one fundamental issue is that there's more than one way to define the positioning of a segment. In my opinion, the "correct" way to place segments in YAML is to use `FirstSec`/`LastSec`, since that causes offsets and sizes to be correct automatically. In the event that there's space at the start or end of a segment, it is better to use a Fill section that hard-code the size/offset, (again in my opinion), because then there is only one source of information for the segment layout (i.e. the section list). Adding Size/Offset fields to the segment //as well// as specifying the `FirstSec`/`LastSec` fields means that users have to be aware of more than one thing when they adjust the YAML (i.e. if they want to move a segment, they have to update the segment offset as well as the sections). There may be valid reasons to do this move. This is of course on the top of the fact that for many test purposes, the size and offset are completely redundant, and it would be nice not to have to remove them if starting from a real object file (indeed, many people may forget to remove them, leading to noisy, or potentially even buggy, tests).

In D144009#4138302 <https://reviews.llvm.org/D144009#4138302>, @treapster wrote:

>> I don't see a clear way forward that meets both sets of requirements. About the best idea I have is to change obj2yaml to emit hard-coded offsets only when needed to enforce the layout (i.e. correct implicit values should be preferred), but then users should not simply remove whole sections, and instead use Fill sections to replace them in the YAML
>
> If you have hardcoded sections' offsets and addresses as well as segments', there's no need to use fill sections and you can simply remove what you don't need, so i don't get why use fill in that case.

To avoid obj2yaml from having to emit section offsets and sizes for the reasons outlined elsewhere.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144009/new/

https://reviews.llvm.org/D144009