[llvm] Object: Don't error out on malformed bitcode files. (PR #96848)

Tue Jul 9 09:47:18 PDT 2024

================
@@ -482,16 +482,47 @@ static uint64_t computeHeadersSize(object::Archive::Kind Kind,
 }
 
 static Expected<std::unique_ptr<SymbolicFile>>
-getSymbolicFile(MemoryBufferRef Buf, LLVMContext &Context) {
+getSymbolicFile(MemoryBufferRef Buf, LLVMContext &Context,
+                object::Archive::Kind Kind) {
   const file_magic Type = identify_magic(Buf.getBuffer());
   // Don't attempt to read non-symbolic file types.
   if (!object::SymbolicFile::isSymbolicFile(Type, &Context))
     return nullptr;
   if (Type == file_magic::bitcode) {
     auto ObjOrErr = object::SymbolicFile::createSymbolicFile(
         Buf, file_magic::bitcode, &Context);
-    if (!ObjOrErr)
-      return ObjOrErr.takeError();
+    // An error reading a bitcode file most likely indicates that the file
+    // was created by a compiler from the future. Normally we don't try to
+    // implement forwards compatibility for bitcode files, but when creating an
+    // archive we can implement best-effort forwards compatibility by treating
+    // the file as a blob and not creating symbol index entries for it. lld and
+    // mold ignore the archive symbol index, so provided that you use one of
+    // these linkers, LTO will work as long as lld or the gold plugin is newer
+    // than the compiler. We only ignore errors if the archive format is one
+    // that is supported by a linker that is known to ignore the index,
+    // otherwise there's no chance of this working so we may as well error out.
+    // We print a warning on read failure so that users of linkers that rely on
+    // the symbol index can diagnose the issue.
+    //
+    // This is the same behavior as GNU ar when the linker plugin returns an
+    // error when reading the input file. If the bitcode file is actually
+    // malformed, it will be diagnosed at link time.
+    if (!ObjOrErr) {
+      switch (Kind) {
+      case object::Archive::K_BSD:
+      case object::Archive::K_GNU:
+      case object::Archive::K_GNU64:
+        llvm::logAllUnhandledErrors(ObjOrErr.takeError(), llvm::errs(),
----------------
dwblaikie wrote:

> > doesn't quite explain the /different/ context in the example - one with the full path name, one with the short name...
> 
> That's `Buf.getBufferIdentifier()` (full name) vs `M.MemberName` (short name). The latter isn't passed into `getSymbolicFile` but I suppose it could be passed in as well to make the warning a bit more consistent.

I was thinking the opposite direction -if the current error handling doesn't have its context passed in, but attached later when going up the callstack, the analogous behavior with an warning callback would be to attach the context in the callback (at whatever layer was adding it to the error, we'd wrap the callback in a "context adding callback" as discussed in a few places in this thread)

But it does present issues if that warning can then be passed back out of the callback and become an error - which then gets the context added to it again.

I would be OK with not addressing that issue for now - having a callback of `void(Error)`, doing the context wrapping as described ^ (so it has the same context as the error used to have, added in the same level in the call stack).

I think it's reasonable to leave the "if I want a werror-like behavior that also early-exits, I should figure out how to deal with context adding during warning without then duplicating that context addition when the warning-promoted-to-error gets propagated up the stack too" - I can think of a few possible solutions to that problem, but I think they're worth having as a separate discussion closer to the use case.

https://github.com/llvm/llvm-project/pull/96848