[llvm] Object: Don't error out on malformed bitcode files. (PR #96848)

Thu Jul 4 06:37:07 PDT 2024

================
@@ -482,16 +482,47 @@ static uint64_t computeHeadersSize(object::Archive::Kind Kind,
 }
 
 static Expected<std::unique_ptr<SymbolicFile>>
-getSymbolicFile(MemoryBufferRef Buf, LLVMContext &Context) {
+getSymbolicFile(MemoryBufferRef Buf, LLVMContext &Context,
+                object::Archive::Kind Kind) {
   const file_magic Type = identify_magic(Buf.getBuffer());
   // Don't attempt to read non-symbolic file types.
   if (!object::SymbolicFile::isSymbolicFile(Type, &Context))
     return nullptr;
   if (Type == file_magic::bitcode) {
     auto ObjOrErr = object::SymbolicFile::createSymbolicFile(
         Buf, file_magic::bitcode, &Context);
-    if (!ObjOrErr)
-      return ObjOrErr.takeError();
+    // An error reading a bitcode file most likely indicates that the file
+    // was created by a compiler from the future. Normally we don't try to
+    // implement forwards compatibility for bitcode files, but when creating an
+    // archive we can implement best-effort forwards compatibility by treating
+    // the file as a blob and not creating symbol index entries for it. lld and
+    // mold ignore the archive symbol index, so provided that you use one of
+    // these linkers, LTO will work as long as lld or the gold plugin is newer
+    // than the compiler. We only ignore errors if the archive format is one
+    // that is supported by a linker that is known to ignore the index,
+    // otherwise there's no chance of this working so we may as well error out.
+    // We print a warning on read failure so that users of linkers that rely on
+    // the symbol index can diagnose the issue.
+    //
+    // This is the same behavior as GNU ar when the linker plugin returns an
+    // error when reading the input file. If the bitcode file is actually
+    // malformed, it will be diagnosed at link time.
+    if (!ObjOrErr) {
+      switch (Kind) {
+      case object::Archive::K_BSD:
+      case object::Archive::K_GNU:
+      case object::Archive::K_GNU64:
+        llvm::logAllUnhandledErrors(ObjOrErr.takeError(), llvm::errs(),
----------------
jh7370 wrote:

I dig some digging into our downstream LLVM usage, and we actually have a downstream patch to LLD for a downstream-only piece of functionality that ultimately results in a potential call to `writeArchive` (with symbol table enabled) from within LLD, with an archive that may contain bitcode files. I believe with this PR the flow goes something like this:
1) The downstream feature is fed one or more invalid bitcode files and attempts to create an archive from them via `writeArchive`.
2) With this PR, the call will result in a warning, due to the invalid bitcode file, printed directly to stderr, ignoring LLD's usual warning mechanism (which includes `--fatal-warnings`).
3) The downstream feature eventually results in the archive being compiled for LTO, which presumably will result in an error due to the unrecognised bitcode file, meaning we ultimately end up with, in theory, a warning and an error.

In this concrete use-case, we either don't want the warning, or we'd want it to be an error, since it's indicative of a future issue and we should stop earlier if possible, but even if we did want just a warning, it really should end up going through LLD's warning mechanism.

If we've got a downstream case where the warning shouldn't be written directly to stderr, I fully expect there to be other such cases.

https://github.com/llvm/llvm-project/pull/96848