[llvm] Object: Don't error out on malformed bitcode files. (PR #96848)

Tue Jul 2 12:38:25 PDT 2024

================
@@ -482,16 +482,47 @@ static uint64_t computeHeadersSize(object::Archive::Kind Kind,
 }
 
 static Expected<std::unique_ptr<SymbolicFile>>
-getSymbolicFile(MemoryBufferRef Buf, LLVMContext &Context) {
+getSymbolicFile(MemoryBufferRef Buf, LLVMContext &Context,
+                object::Archive::Kind Kind) {
   const file_magic Type = identify_magic(Buf.getBuffer());
   // Don't attempt to read non-symbolic file types.
   if (!object::SymbolicFile::isSymbolicFile(Type, &Context))
     return nullptr;
   if (Type == file_magic::bitcode) {
     auto ObjOrErr = object::SymbolicFile::createSymbolicFile(
         Buf, file_magic::bitcode, &Context);
-    if (!ObjOrErr)
-      return ObjOrErr.takeError();
+    // An error reading a bitcode file most likely indicates that the file
+    // was created by a compiler from the future. Normally we don't try to
+    // implement forwards compatibility for bitcode files, but when creating an
+    // archive we can implement best-effort forwards compatibility by treating
+    // the file as a blob and not creating symbol index entries for it. lld and
+    // mold ignore the archive symbol index, so provided that you use one of
+    // these linkers, LTO will work as long as lld or the gold plugin is newer
+    // than the compiler. We only ignore errors if the archive format is one
+    // that is supported by a linker that is known to ignore the index,
+    // otherwise there's no chance of this working so we may as well error out.
+    // We print a warning on read failure so that users of linkers that rely on
+    // the symbol index can diagnose the issue.
+    //
+    // This is the same behavior as GNU ar when the linker plugin returns an
+    // error when reading the input file. If the bitcode file is actually
+    // malformed, it will be diagnosed at link time.
+    if (!ObjOrErr) {
+      switch (Kind) {
+      case object::Archive::K_BSD:
+      case object::Archive::K_GNU:
+      case object::Archive::K_GNU64:
+        llvm::logAllUnhandledErrors(ObjOrErr.takeError(), llvm::errs(),
----------------
dwblaikie wrote:

reusability is a cornerstone of LLVM (best citation I have for that is the first sentence on llvm.org: "The LLVM Project is a collection of modular and reusable compiler and toolchain technologies."), and that means not using global state (including global streams like stdout/stderr) and not exiting (returning errors instead) - admittedly we have some violations (eg: report_fatal_error, lld) but it's a generally held to convention.

I'd like to see something like https://reviews.llvm.org/D65515 - though I suspect a good handler might be one that takes an Error by rvalue, and returns an Error by value (to allow the warning handler to say "just treat this as an error and  don't continue"). Though I don't feel so strongly about that particular variation.

Please add a callback that may have a default (that prints to stderr or the like) - but that /can/ be specified via llvm-objdump.cpp for instance (eg: it's not much use if there's a default argument to a function call, but then there are unconditional calls to that function withuot specifying the handler that are still internal to libObject).

I guess libObject doesn't have any sort of context object this could be readily attached to? None that I could find, at a glance?

https://github.com/llvm/llvm-project/pull/96848