[PATCH] D126904: [llvm-objdump] Add support for dumping embedded offloading data

Mon Jun 13 02:33:45 PDT 2022

jhenderson added a comment.

Sorry for the delay - I ended up off sick for a big chunk of last week.

Please make sure new options are documented in the llvm-objdump command guide documentation (see llvm/docs/CommandGuide/llvm-objdump.rst).

================
Comment at: llvm/test/tools/llvm-objdump/Offload/offload.test:1-2
+# RUN: llvm-objdump --offloading %S/Inputs/offload-binary.out | FileCheck %s --check-prefix=OFFLOAD
+# RUN: llvm-objdump --offloading %S/Inputs/offload-elf.o | FileCheck %s --check-prefix=OFFLOAD
+
----------------
As this is a new output format, we probably want to use the FileCheck options `--match-full-lines --strict-whitespace --implicit-check-not={{.}}` to ensure the entire output (including indentation etc) is checked-for and there is nothing additional being printed that shouldn't be.

As there's only one check pattern used in the file, drop the `--check-prefix` argument and just use `CHECK:`, `CHECK-NEXT:` etc.

Using pre-canned binaries is less than ideal. Can you use yaml2obj to generate the test inputs at runtime? This will make them less opaque and easier to maintain long-term. Depending on the section's complexity, it probably makes sense to add support for it to yaml2obj explicitly, so that it's easy to specify kind/arch/triple/producer etc in dedicated fields.

================
Comment at: llvm/test/tools/llvm-objdump/Offload/offload.test:9-10
+OFFLOAD-NEXT: producer        openmp
+
+OFFLOAD: OFFLOADING IMAGE [1]:
+OFFLOAD-NEXT: kind            llvm ir
----------------
Should there be a blank line in the output between each offloading image? If so, use `CHECK-EMPTY` to check for it. If not, use `CHECK-NEXT` at the start of the next block.

================
Comment at: llvm/tools/llvm-objdump/OffloadDump.cpp:1
+#include "OffloadDump.h"
+#include "llvm-objdump.h"
----------------
Missing license header.

================
Comment at: llvm/tools/llvm-objdump/OffloadDump.cpp:45-46
+    auto BinaryOrErr = OffloadBinary::create(Buffer);
+    if (!BinaryOrErr)
+      return BinaryOrErr.takeError();
+
----------------
Test case for the error path?

================
Comment at: llvm/tools/llvm-objdump/OffloadDump.cpp:60-62
+    Expected<StringRef> Name = Sec.getName();
+    if (!Name || !Name->startswith(OFFLOAD_SECTION_MAGIC_STR))
+      continue;
----------------
It's extremely unfortunate that this relies on section names rather than section types. In my opinion, it would be far more appropriate to have a SHT_LLVM_OFFLOAD section type (or similar), so that section names don't need looking up and comparing to find the relevant section. ELF is designed really to use types for section comparison, not really names.

================
Comment at: llvm/tools/llvm-objdump/OffloadDump.cpp:66
+    if (!Contents)
+      reportError(Contents.takeError(), O->getFileName());
+
----------------
Error path test?

================
Comment at: llvm/tools/llvm-objdump/OffloadDump.cpp:71
+    if (!BinaryOrErr)
+      reportError(BinaryOrErr.takeError(), O->getFileName());
+    OffloadBinary &Binary = **BinaryOrErr;
----------------
Error path test?

================
Comment at: llvm/tools/llvm-objdump/OffloadDump.cpp:75
+    if (Error Err = visitAllBinaries(&Binary))
+      handleAllErrors(std::move(Err), [](const ErrorInfoBase &EI) {});
+  }
----------------
This seems to be just throwing away all errors. Are you sure that's what you meant to do, and not to report them with `reportError`? If so, did you mean to use `consumeError` (or possibly even `cantFail`)?

================
Comment at: llvm/tools/llvm-objdump/OffloadDump.cpp:83
+  if (Error Err = visitAllBinaries(OB))
+    handleAllErrors(std::move(Err), [](const ErrorInfoBase &EI) {});
+}
----------------
Ditto: should this be `consumeError`?

================
Comment at: llvm/tools/llvm-objdump/OffloadDump.cpp:78
+
+/// Print the contents of a single offload binary file \p OB.
+void llvm::printOffloadBinary(const OffloadBinary *OB) {
----------------
jhuber6 wrote:
> tra wrote:
> > jhuber6 wrote:
> > > tra wrote:
> > > > I don't think the 'single' part of this assertion is true. AFAICT, `extractAllBinaries` will happily print all subsequent binaries if it finds them in the buffer. I think this should call `printBinary` instead.
> > > Yeah, I meant it more like to print on the single file that was already extracted or something. But it can definitely contain multiple. The reason I chose this method is because I wanted something that worked even if these sections were concatenated through a relocatable link or something. So whenever we parse one of these we just check the sizes to make sure there's not another one concatenated to it. I can make the comment less confusing.
> > I think the root of the problem here is that we're treating `OffloadBinary` as both the pointer to the binary itself and as a pointer to collection of such binaries.
> > 
> > I think it's not a good API -- extractAllBinaries gets to look under the hood of the implmentation -- check if containing buffer has extra space beyond the OffloadBinary it's been passed. What if the user places something else in the memory buffer right behind the OffloadBinary object user passed to printOffloadBinary ? They would be within their rights to do so as the function would be expected to care about the content of the `*OB` only.
> > 
> > I think we should be a bit more pedantic about such things. If we expect to operate on a collection, the API should reflect that. E.g. use SmallVector<OffloadBinary*>. 
> > I think implementing `ObjectFile::offload_sections()` and `OffloadSection::offload_binaries()` would help both here and above. Or, possibly, just `ObjectFile::offload_binaries()``if we don't need to care about how binaries are stored in the object file and just wanr offload binaries themselves.
> > 
> So the problem is we don't know how many of these are in here until we parse it. This requires getting the `size` field within the `OffloadBinary`. So even if we abstracted it to this iterator, it would still need some parsing like this behind the scenes. I could have made the binary format contain many within a single binary image, but like I said I wanted this to be stable under arbitrary concatenation by the linker. I'm not sure if we could have a different API considering the parsing requirements.
> 
> This can definitely be problematic, depending on usage. I'm assuming if a user initialized an object on a memory buffer containing a bunch of junk it would probably be fine and just stop once the file is fully parsed. We could probably just ignore a parsing error, basically just stop tryingto read things if we don't catch the magic bytes or there's not enough space left over, but that's probably not ideal.
> 
> It's definitely a little obtuse, but I'm not sure if there's a good way to make it work better considering how we parse them.
> I said I wanted this to be stable under arbitrary concatenation by the linker

Have you looked at how DWARF debug sections like .debug_line or .debug_aranges are structured? Typically, these sections have a header which contains information like total size of that section (or number of entries in the section) and version information. These sections are still concatenated, with the length simply representing the contribution from a single CU.

================
Comment at: llvm/tools/llvm-objdump/llvm-objdump.cpp:194
 static bool FaultMapSection;
+static bool Offloading;
 static bool FileHeaders;
----------------
Nit: it looks like there's a half-hearted attempt to have these fields in some form of alphabetical order - it's certainly not 100%, but I feel like this option probably belongs close to the `RawClangAST` option below.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D126904/new/

https://reviews.llvm.org/D126904