[llvm] Object: Don't error out on malformed bitcode files. (PR #96848)

Tue Jul 9 00:47:58 PDT 2024

================
@@ -482,16 +482,47 @@ static uint64_t computeHeadersSize(object::Archive::Kind Kind,
 }
 
 static Expected<std::unique_ptr<SymbolicFile>>
-getSymbolicFile(MemoryBufferRef Buf, LLVMContext &Context) {
+getSymbolicFile(MemoryBufferRef Buf, LLVMContext &Context,
+                object::Archive::Kind Kind) {
   const file_magic Type = identify_magic(Buf.getBuffer());
   // Don't attempt to read non-symbolic file types.
   if (!object::SymbolicFile::isSymbolicFile(Type, &Context))
     return nullptr;
   if (Type == file_magic::bitcode) {
     auto ObjOrErr = object::SymbolicFile::createSymbolicFile(
         Buf, file_magic::bitcode, &Context);
-    if (!ObjOrErr)
-      return ObjOrErr.takeError();
+    // An error reading a bitcode file most likely indicates that the file
+    // was created by a compiler from the future. Normally we don't try to
+    // implement forwards compatibility for bitcode files, but when creating an
+    // archive we can implement best-effort forwards compatibility by treating
+    // the file as a blob and not creating symbol index entries for it. lld and
+    // mold ignore the archive symbol index, so provided that you use one of
+    // these linkers, LTO will work as long as lld or the gold plugin is newer
+    // than the compiler. We only ignore errors if the archive format is one
+    // that is supported by a linker that is known to ignore the index,
+    // otherwise there's no chance of this working so we may as well error out.
+    // We print a warning on read failure so that users of linkers that rely on
+    // the symbol index can diagnose the issue.
+    //
+    // This is the same behavior as GNU ar when the linker plugin returns an
+    // error when reading the input file. If the bitcode file is actually
+    // malformed, it will be diagnosed at link time.
+    if (!ObjOrErr) {
+      switch (Kind) {
+      case object::Archive::K_BSD:
+      case object::Archive::K_GNU:
+      case object::Archive::K_GNU64:
+        llvm::logAllUnhandledErrors(ObjOrErr.takeError(), llvm::errs(),
----------------
jh7370 wrote:

Could we have each layer of the callstack do the minimal context addition necessary so that context isn't lost? In this particular case, the getSymbolicFile function wouldn't need to do anything other than pass the error to the callback directly, the caller would add the member name via a wrapper around the callback, since it is the function that knows it, and then this process would proceed up the stack as more additional context is gained at each level (obviously cases in the stack that don't have any additional context to add don't need to wrap the callback themselves). We do something similar in the Object library in some places, e.g. [here](https://github.com/llvm/llvm-project/blob/93869dfd89387844bf8b605ebcd1abc0cc81bde8/llvm/lib/Object/ELFObjectFile.cpp#L899): in this case the lower code reports the low-level error, the code in the link adds information about the referencing section, and the calling code will ultimately add the file name, before it all ends up getting printed as a warning in llvm-readobj.

I acknowledge the situation is slightly different here, since we're distinguishing between warnings and errors, whereas llvm-readobj treats most things as warnings, so can continue to pass Error instances up the stack, but the overall approach should still apply, just using callbacks (with each layer wrapping the callback if needed), or even a string passed back upwards that is added to then ultimately passed to a callback.

https://github.com/llvm/llvm-project/pull/96848