[PATCH] D82858: [llvm-objdump] Detect note section for ELF objects

Wed Jul 1 03:12:05 PDT 2020

jhenderson added a comment.

Could we bring this discussion up on llvm-dev please? It seems like it needs wider attention to me. Essentially, there are two related questions. "What should the disassembler do?" is the first - should it be for interepreting bytes as instructions (this is largely the current approach), or should it be for intepreting all sections (what you're essentially proposing). The second is essentially "Should llvm-objdump/llvm-read[obj/elf]/<possibly also other tools like llvm-nm> be combined into a single tool? If so, what is the best way to do so?" The latter will of course still need to provide GNU-compatible front-ends.

In D82858#2124602 <https://reviews.llvm.org/D82858#2124602>, @rochauha wrote:

> I'd also like to point out another fact. There is already some feature overlap between llvm-readobj and llvm-objdump - like printing symbol information using --symbols and --syms respectively. Another such example is using the flag --section-headers in both tools to print section headers.

This is an unfortunate consequence of people wanting to have GNU compatibility. In fact, in our downstream version of llvm-objdump we don't officially support the features that are also available in llvm-readobj so that we don't have to maintain two tools doing the same thing. I am happy to extend things for GNU compatibility, but anything beyond that should not be duplicated in two places.

In D82858#2124601 <https://reviews.llvm.org/D82858#2124601>, @rochauha wrote:

> > 1. Having multiple tools to do the same job is not a good idea - each requires its own maintenance, the behaviour can diverge, bugs might require fixing in two places/support for new things etc etc etc. In an ideal world, we'd merge all the binary tools (GNU and LLVM) into a single tool, or redistribute functionality somehow, so that we don't have duplicate functionality like we already do. This takes us further away from that ideal.
>
> I agree that having a single tool is the direction we must aim for. But to do so, one tool needs to be improved to the point that it is 'feature complete'. llvm-objdump already disassembles all contents of the binary. It's just that everything is disassembled as instructions. Even notes are disassembled as instructions today. I am not 'adding' anything new; just trying to 'correct' the existing output. Targets will still need to do implement things from their side(if needed) to take advantages of the infrastructure changes.
>  The initial plan would be to have note record handling in the MC layer. llvm-objdump will just iterate over the notes section. For each note record it will query the registered targets. The owning target will appropriately disassemble the bytes. A note record must be disassembled using the .byte directive if no target owns the note / printing for a particular kind of note is not implemented.

Your definition of "correct" does not match mine. I'd interpret --disassemble-all to mean "disassemble all sections as instructions". I acknowledge that in most cases this probably isn't useful, but honestly I don't know what the purpose of the feature was in the first place. Relatedly, not even text sections necessarily consist entirely of executable instructions - if I'm not mistaken jump tables and other embedded data can exist in them too.

Note parsing already exists in llvm-readobj, and possibly in the Object library (without looking I don't remember where the bulk of the work is done). We don't want to add note parsing to the MC library as well. Duplicated functionality is bad as already outlined. If this feature is to be implemented in disassembly, it should be reusing the same functionality as llvm-readobj. At most, the only difference should be how to print the output. Also I don't think SHT_NOTE sections are intended to be target specific: they are supposed to be vendor specific (where vendor is defined by the note's content, as opposed the EM_* field of the ELF header). For example, there are GNU notes, and I think LLVM notes too. I know we have downstream notes with another vendor name too.

>> 3. I'm not convinced people actually find dumping all sections in an interpreted form at once useful. Do you actually have any users for that? I think most people are interested in the disassembly of their code, but are unlikely to want this information in the same output as note information.
> 
> Yes, it is super helpful even if the user can just take a look at an entire binary and make sense of it. He/she shouldn't need two tools to do that. Many times note records contain useful information. Looking at the entire binary in an understandable text form can even help when working on bugs.

So you actually use note contents in your day-to-day development to immediately tell you something about the executable instructions? If not, then I don't see how it can be helpful. In all the years I've spent helping develop and maintain our downstream binutils, I can honestly say that I've never found a "dump everything" approach useful, even though it is supported by our downstream dumping tool, except when having to diff a before and after output, but even in that case, I'm not using pieces of information from one part to inform my understanding of another part.

>> 4. I don't quite follow whether you're saying that one motivation is to make things re-assembleable, but if it is, the .note section is not the place to start - there are other sections where this would be more useful (e.g. data sections).
> 
> Before even thinking of re-assembly, we need to first make sure that all entities are disassembled in a //proper// way. For example AMDGPU kernel descriptors as assembler directives rather than instructions. Similarly for notes. It may be considered as a //low hanging fruit//. But it definitely needs to be done. All entities need to be disassembled appropriately to make the final text relevant to the assembler.

But is re-assembly a long-term goal? If it isn't then there's no point in doing things to work towards it. There are cleaner ways of dumping information rather than trying to squeeze it all into the disassembly output, and in most cases those ways already exist. Even if it were, the chances are that your note output would have to be not human readable (i.e. in the form of .byte etc directives) since there is no corresponding assembler instructions/directives that correspond to them. For example, how do you create a "build-id" assembler input?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82858/new/

https://reviews.llvm.org/D82858