[PATCH] D126904: [llvm-objdump] Add support for dumping embedded offloading data

Tue Jun 14 04:50:00 PDT 2022

jhuber6 marked 11 inline comments as done.
jhuber6 added a comment.

Thanks for the comments again, I'll address them soon.

In D126904#3580868 <https://reviews.llvm.org/D126904#3580868>, @jhenderson wrote:

> I'd be concerned if a dumping implementation were written and then needed regular modification due to the section format not being stable yet. Usually, we'd try to get the details of the section format sorted before writing the dumping implementation, to reduce churn.

The format itself I consider mostly stable as I don't have plans to change the structure. However, if we change the section to a type we may want to do that. I was hesitant to add a new type to Elf considering we would need support for it everywhere else, but it's definitely the more correct option. Also I plan on adding some more data to the binary section that will be dumped, but I figured it's not a big deal to make follow-up patches that just update the printing. Let me know if you think these are deal-breakers.

> The only material I can offer is the yaml2obj source code - there are several other section kinds that have different dumping behaviour, that should be fairly straightforward to use as a basis for the implementation for this section. Take a look at how, for example, SHT_RELA sections are handled in yaml2obj.

I will probably just write the binary contents by hand for now, it's pretty much just a big struct of offsets into a string table.

================
Comment at: llvm/test/tools/llvm-objdump/Offload/offload-failure.test:20-24
+    Flags:           [ SHF_EXCLUDE ]
+    Address:         0x1020
+    AddressAlign:    0x0000000000000008
+    Content:         "00000000"
+    Size:            4
----------------
jhenderson wrote:
> This can be simplified:
> 1. The flags aren't related to the offloading section printing, so get rid of them.
> 2. Is the address needed? I suspect not from my understanding of what the program is doing.
> 3. I suspect the AddressAlign field is also unnecessary for printing purposes, and can be omitted.
> 4. You don't need both Content and Size, unless the section size needs to be bigger than the specified content. You can omit the Content field and the Size parameter will set the section size still, with the content set to null bytes to fill it up.
I mostly just copied these from another test, I'll clean them up.

================
Comment at: llvm/tools/llvm-objdump/OffloadDump.cpp:78
+
+/// Print the contents of a single offload binary file \p OB.
+void llvm::printOffloadBinary(const OffloadBinary *OB) {
----------------
jhenderson wrote:
> jhuber6 wrote:
> > jhenderson wrote:
> > > jhuber6 wrote:
> > > > tra wrote:
> > > > > jhuber6 wrote:
> > > > > > tra wrote:
> > > > > > > I don't think the 'single' part of this assertion is true. AFAICT, `extractAllBinaries` will happily print all subsequent binaries if it finds them in the buffer. I think this should call `printBinary` instead.
> > > > > > Yeah, I meant it more like to print on the single file that was already extracted or something. But it can definitely contain multiple. The reason I chose this method is because I wanted something that worked even if these sections were concatenated through a relocatable link or something. So whenever we parse one of these we just check the sizes to make sure there's not another one concatenated to it. I can make the comment less confusing.
> > > > > I think the root of the problem here is that we're treating `OffloadBinary` as both the pointer to the binary itself and as a pointer to collection of such binaries.
> > > > > 
> > > > > I think it's not a good API -- extractAllBinaries gets to look under the hood of the implmentation -- check if containing buffer has extra space beyond the OffloadBinary it's been passed. What if the user places something else in the memory buffer right behind the OffloadBinary object user passed to printOffloadBinary ? They would be within their rights to do so as the function would be expected to care about the content of the `*OB` only.
> > > > > 
> > > > > I think we should be a bit more pedantic about such things. If we expect to operate on a collection, the API should reflect that. E.g. use SmallVector<OffloadBinary*>. 
> > > > > I think implementing `ObjectFile::offload_sections()` and `OffloadSection::offload_binaries()` would help both here and above. Or, possibly, just `ObjectFile::offload_binaries()``if we don't need to care about how binaries are stored in the object file and just wanr offload binaries themselves.
> > > > > 
> > > > So the problem is we don't know how many of these are in here until we parse it. This requires getting the `size` field within the `OffloadBinary`. So even if we abstracted it to this iterator, it would still need some parsing like this behind the scenes. I could have made the binary format contain many within a single binary image, but like I said I wanted this to be stable under arbitrary concatenation by the linker. I'm not sure if we could have a different API considering the parsing requirements.
> > > > 
> > > > This can definitely be problematic, depending on usage. I'm assuming if a user initialized an object on a memory buffer containing a bunch of junk it would probably be fine and just stop once the file is fully parsed. We could probably just ignore a parsing error, basically just stop tryingto read things if we don't catch the magic bytes or there's not enough space left over, but that's probably not ideal.
> > > > 
> > > > It's definitely a little obtuse, but I'm not sure if there's a good way to make it work better considering how we parse them.
> > > > I said I wanted this to be stable under arbitrary concatenation by the linker
> > > 
> > > Have you looked at how DWARF debug sections like .debug_line or .debug_aranges are structured? Typically, these sections have a header which contains information like total size of that section (or number of entries in the section) and version information. These sections are still concatenated, with the length simply representing the contribution from a single CU.
> > Right now I have a binary that knows its own size, and if the size of the buffer is greater than the size of that binary we look for another one. Forgive me if I'm misunderstanding here, but the linker will only concatenate sections right? Do these sections simply work as some kind of buffer whose size indicated how many sections were concatenated? That is, for every `.llvm.offloading` section I'd have some other reference section that just contains a single byte whose size I can check? Otherwise I'm not sure how  we could figure out how many of these sections have been concatenated without parsing them first.
> To be clear, I know very little about the new section type, how it is used and so on, so what I'm suggesting may not make much sense. Linkers concatenate sections blindly (in general). As such, if you had 1.o and 2.o each with a .llvm.offloading section, and you combine them into out.elf, you'd end up with a single output section containing the concatenation of the two. Presumably this means you'll end up with something that looks a bit like this?
> ```
> .llvm.offloading
>   .llvm.offloading(1.o) - size field
>   .llvm.offloading(1.o) - rest of section
>   .llvm.offloading(2.o) - size field
>   .llvm.offloading(2.o) - rest of section
> ```
> Is that correct? If so, I don't think there's anything to do here, assuming the section size is not guaranteed to be the same for all input sections.
Yeah, that's more or less  how it's set up. I'm assuming there's not much we can do to make parsing this easier.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D126904/new/

https://reviews.llvm.org/D126904