[llvm] Object: Don't error out on malformed bitcode files. (PR #96848)

Thu Jun 27 20:44:47 PDT 2024

================
@@ -482,16 +482,47 @@ static uint64_t computeHeadersSize(object::Archive::Kind Kind,
 }
 
 static Expected<std::unique_ptr<SymbolicFile>>
-getSymbolicFile(MemoryBufferRef Buf, LLVMContext &Context) {
+getSymbolicFile(MemoryBufferRef Buf, LLVMContext &Context,
+                object::Archive::Kind Kind) {
   const file_magic Type = identify_magic(Buf.getBuffer());
   // Don't attempt to read non-symbolic file types.
   if (!object::SymbolicFile::isSymbolicFile(Type, &Context))
     return nullptr;
   if (Type == file_magic::bitcode) {
     auto ObjOrErr = object::SymbolicFile::createSymbolicFile(
         Buf, file_magic::bitcode, &Context);
-    if (!ObjOrErr)
-      return ObjOrErr.takeError();
+    // An error reading a bitcode file most likely indicates that the file
+    // was created by a compiler from the future. Normally we don't try to
+    // implement forwards compatibility for bitcode files, but when creating an
+    // archive we can implement best-effort forwards compatibility by treating
+    // the file as a blob and not creating symbol index entries for it. lld and
+    // mold ignore the archive symbol index, so provided that you use one of
+    // these linkers, LTO will work as long as lld or the gold plugin is newer
+    // than the compiler. We only ignore errors if the archive format is one
+    // that is supported by a linker that is known to ignore the index,
+    // otherwise there's no chance of this working so we may as well error out.
+    // We print a warning on read failure so that users of linkers that rely on
+    // the symbol index can diagnose the issue.
+    //
+    // This is the same behavior as GNU ar when the linker plugin returns an
+    // error when reading the input file. If the bitcode file is actually
+    // malformed, it will be diagnosed at link time.
+    if (!ObjOrErr) {
+      switch (Kind) {
+      case object::Archive::K_BSD:
+      case object::Archive::K_GNU:
+      case object::Archive::K_GNU64:
+        llvm::logAllUnhandledErrors(ObjOrErr.takeError(), llvm::errs(),
----------------
pcc wrote:

> I don't think we need to design for hypothetical clients - but hold to a fairly generic goal that LLVM's is designed as reusable library components and it's not too speculative/arbitrary to not print directly from a library because clients may have different needs for how their output is produced/rendered/handled.

I hear what you're saying but if we say "don't write to stderr from a library" then we need to answer "what should we do instead". If LLVM already had a diagnostic API with programmatic control over diagnostic levels and  `writeArchive` were already using it then it would be a no-brainer to expose the warning via that API. But if that isn't actually implemented yet then at that point it becomes non-obvious what the right thing to do is, and we need to do *something* -- and that's when we start needing to think about what the client might need (i.e. "design for hypothetical clients"). If/when the actual client comes along that needs this we'd probably discover that what we designed was not sufficient and would need to redo the API for the client's needs, which would take more effort overall than just designing it for the actual client in the first place. And if a client doesn't care about any of this and just wants us to write to stderr they would need to change their usage code multiple times for no benefit to them.

I suppose the minimal thing we could do here is add a `llvm::raw_ostream &Warn` to `writeArchive` and that's where the warnings would go. A client that wants warnings as errors can error out and delete the output file if anything was written to the stream, a buffered client can save the output and write it in one go and a GUI can split on newlines. That seems fairly minimal, doesn't require writing dead code and should be easy enough to replace with something else if it becomes necessary. But really I think we should just wait for an actual client.

https://github.com/llvm/llvm-project/pull/96848