[llvm] Object: Don't error out on malformed bitcode files. (PR #96848)

Wed Jul 3 17:18:28 PDT 2024

================
@@ -482,16 +482,47 @@ static uint64_t computeHeadersSize(object::Archive::Kind Kind,
 }
 
 static Expected<std::unique_ptr<SymbolicFile>>
-getSymbolicFile(MemoryBufferRef Buf, LLVMContext &Context) {
+getSymbolicFile(MemoryBufferRef Buf, LLVMContext &Context,
+                object::Archive::Kind Kind) {
   const file_magic Type = identify_magic(Buf.getBuffer());
   // Don't attempt to read non-symbolic file types.
   if (!object::SymbolicFile::isSymbolicFile(Type, &Context))
     return nullptr;
   if (Type == file_magic::bitcode) {
     auto ObjOrErr = object::SymbolicFile::createSymbolicFile(
         Buf, file_magic::bitcode, &Context);
-    if (!ObjOrErr)
-      return ObjOrErr.takeError();
+    // An error reading a bitcode file most likely indicates that the file
+    // was created by a compiler from the future. Normally we don't try to
+    // implement forwards compatibility for bitcode files, but when creating an
+    // archive we can implement best-effort forwards compatibility by treating
+    // the file as a blob and not creating symbol index entries for it. lld and
+    // mold ignore the archive symbol index, so provided that you use one of
+    // these linkers, LTO will work as long as lld or the gold plugin is newer
+    // than the compiler. We only ignore errors if the archive format is one
+    // that is supported by a linker that is known to ignore the index,
+    // otherwise there's no chance of this working so we may as well error out.
+    // We print a warning on read failure so that users of linkers that rely on
+    // the symbol index can diagnose the issue.
+    //
+    // This is the same behavior as GNU ar when the linker plugin returns an
+    // error when reading the input file. If the bitcode file is actually
+    // malformed, it will be diagnosed at link time.
+    if (!ObjOrErr) {
+      switch (Kind) {
+      case object::Archive::K_BSD:
+      case object::Archive::K_GNU:
+      case object::Archive::K_GNU64:
+        llvm::logAllUnhandledErrors(ObjOrErr.takeError(), llvm::errs(),
----------------
pcc wrote:

I looked at implementing your suggestion. Here's the patch I came up with on top of this one:
https://github.com/pcc/llvm-project/commit/llvm-ar-warn-example

I'm sorry, but I don't think it's a good API. So I still want to keep things as is here until a client needs something else.

After implementing it I discovered the following problems:
1. Warnings are of poor quality. For example, without the added call to `createFileError` and with the changes to llvm-ar.cpp reverted, the warning looks like this:
```
warning: Invalid bitcode signature
```
This is because of the error wrapping in the caller, which does not occur when calling the `Warn` function directly. I partially mitigated it by adding the call to `createFileError`. This leads to problem 2.
2. The change does not properly support the warnings-as-errors use case. In the change, I made a change to llvm-ar.cpp to simulate a hypothetical client that wants warnings as errors. After the change, we can see an ugly error message occur:
```
ra/bin/llvm-ar: error: bad2.a: 'input.bc': 'ra/obj/llvm/test/Object/Output/archive-malformed-object.test.tmp.dir/input.bc': Invalid bitcode signature
```
Note that `input.bc` is repeated. In order to fix this, more extensive refactoring is needed to ensure that `createFileError` is only called once. We'll also need to pass in the correct file argument wherever an error is created. I don't think this extensive refactoring should be done just so that we can print one warning. Instead, it should be the responsibility of the future contributor who wants this behavior in their client. Of course, if you want to do the refactoring yourself, you are welcome to do so in a followup.

https://github.com/llvm/llvm-project/pull/96848