[PATCH] D126904: [llvm-objdump] Add support for dumping embedded offloading data

Mon Jun 13 07:13:04 PDT 2022

jhuber6 added inline comments.

================
Comment at: llvm/tools/llvm-objdump/OffloadDump.cpp:78
+
+/// Print the contents of a single offload binary file \p OB.
+void llvm::printOffloadBinary(const OffloadBinary *OB) {
----------------
jhenderson wrote:
> jhuber6 wrote:
> > tra wrote:
> > > jhuber6 wrote:
> > > > tra wrote:
> > > > > I don't think the 'single' part of this assertion is true. AFAICT, `extractAllBinaries` will happily print all subsequent binaries if it finds them in the buffer. I think this should call `printBinary` instead.
> > > > Yeah, I meant it more like to print on the single file that was already extracted or something. But it can definitely contain multiple. The reason I chose this method is because I wanted something that worked even if these sections were concatenated through a relocatable link or something. So whenever we parse one of these we just check the sizes to make sure there's not another one concatenated to it. I can make the comment less confusing.
> > > I think the root of the problem here is that we're treating `OffloadBinary` as both the pointer to the binary itself and as a pointer to collection of such binaries.
> > > 
> > > I think it's not a good API -- extractAllBinaries gets to look under the hood of the implmentation -- check if containing buffer has extra space beyond the OffloadBinary it's been passed. What if the user places something else in the memory buffer right behind the OffloadBinary object user passed to printOffloadBinary ? They would be within their rights to do so as the function would be expected to care about the content of the `*OB` only.
> > > 
> > > I think we should be a bit more pedantic about such things. If we expect to operate on a collection, the API should reflect that. E.g. use SmallVector<OffloadBinary*>. 
> > > I think implementing `ObjectFile::offload_sections()` and `OffloadSection::offload_binaries()` would help both here and above. Or, possibly, just `ObjectFile::offload_binaries()``if we don't need to care about how binaries are stored in the object file and just wanr offload binaries themselves.
> > > 
> > So the problem is we don't know how many of these are in here until we parse it. This requires getting the `size` field within the `OffloadBinary`. So even if we abstracted it to this iterator, it would still need some parsing like this behind the scenes. I could have made the binary format contain many within a single binary image, but like I said I wanted this to be stable under arbitrary concatenation by the linker. I'm not sure if we could have a different API considering the parsing requirements.
> > 
> > This can definitely be problematic, depending on usage. I'm assuming if a user initialized an object on a memory buffer containing a bunch of junk it would probably be fine and just stop once the file is fully parsed. We could probably just ignore a parsing error, basically just stop tryingto read things if we don't catch the magic bytes or there's not enough space left over, but that's probably not ideal.
> > 
> > It's definitely a little obtuse, but I'm not sure if there's a good way to make it work better considering how we parse them.
> > I said I wanted this to be stable under arbitrary concatenation by the linker
> 
> Have you looked at how DWARF debug sections like .debug_line or .debug_aranges are structured? Typically, these sections have a header which contains information like total size of that section (or number of entries in the section) and version information. These sections are still concatenated, with the length simply representing the contribution from a single CU.
Right now I have a binary that knows its own size, and if the size of the buffer is greater than the size of that binary we look for another one. Forgive me if I'm misunderstanding here, but the linker will only concatenate sections right? Do these sections simply work as some kind of buffer whose size indicated how many sections were concatenated? That is, for every `.llvm.offloading` section I'd have some other reference section that just contains a single byte whose size I can check? Otherwise I'm not sure how  we could figure out how many of these sections have been concatenated without parsing them first.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D126904/new/

https://reviews.llvm.org/D126904