[llvm] Object: Don't error out on malformed bitcode files. (PR #96848)

Mon Jul 8 16:11:42 PDT 2024

================
@@ -482,16 +482,47 @@ static uint64_t computeHeadersSize(object::Archive::Kind Kind,
 }
 
 static Expected<std::unique_ptr<SymbolicFile>>
-getSymbolicFile(MemoryBufferRef Buf, LLVMContext &Context) {
+getSymbolicFile(MemoryBufferRef Buf, LLVMContext &Context,
+                object::Archive::Kind Kind) {
   const file_magic Type = identify_magic(Buf.getBuffer());
   // Don't attempt to read non-symbolic file types.
   if (!object::SymbolicFile::isSymbolicFile(Type, &Context))
     return nullptr;
   if (Type == file_magic::bitcode) {
     auto ObjOrErr = object::SymbolicFile::createSymbolicFile(
         Buf, file_magic::bitcode, &Context);
-    if (!ObjOrErr)
-      return ObjOrErr.takeError();
+    // An error reading a bitcode file most likely indicates that the file
+    // was created by a compiler from the future. Normally we don't try to
+    // implement forwards compatibility for bitcode files, but when creating an
+    // archive we can implement best-effort forwards compatibility by treating
+    // the file as a blob and not creating symbol index entries for it. lld and
+    // mold ignore the archive symbol index, so provided that you use one of
+    // these linkers, LTO will work as long as lld or the gold plugin is newer
+    // than the compiler. We only ignore errors if the archive format is one
+    // that is supported by a linker that is known to ignore the index,
+    // otherwise there's no chance of this working so we may as well error out.
+    // We print a warning on read failure so that users of linkers that rely on
+    // the symbol index can diagnose the issue.
+    //
+    // This is the same behavior as GNU ar when the linker plugin returns an
+    // error when reading the input file. If the bitcode file is actually
+    // malformed, it will be diagnosed at link time.
+    if (!ObjOrErr) {
+      switch (Kind) {
+      case object::Archive::K_BSD:
+      case object::Archive::K_GNU:
+      case object::Archive::K_GNU64:
+        llvm::logAllUnhandledErrors(ObjOrErr.takeError(), llvm::errs(),
----------------
pcc wrote:

> I personally don't like it, as I don't think a stream passed in is any better than passing in the suggested callback.

I'm sorry that you don't like it. I don't like it either (I don't think we'll find an API that all of us like, this seems like a very bikesheddable topic), but it seems like a reasonable compromise to me. The reason for the stream based API is that we know that `Error` is the wrong API for warning use cases because of its poor handling of contexts, so we should avoid propagating it into the caller. The stream is a minimal API extension that'll be easier to change later.

> Having clients have to trawl through text to find the "warning: " text embedded in the original text, or somehow have to add it inline in the stream after each newline, or something else seems pretty messy compared to this ^.

That's not what a client would need to do. As I said, the messages added to the buffer will already contain "warning: ", so they will just print the buffer text as-is followed by "error: warnings treated as errors". That's not ideal but it's good enough, and if it doesn't work for someone they are welcome to propose their own patch with a better error handling API.

> doesn't quite explain the /different/ context in the example - one with the full path name, one with the short name... 

That's `Buf.getBufferIdentifier()` (full name) vs `M.MemberName` (short name). The latter isn't passed into `getSymbolicFile` but I suppose it could be passed in as well to make the warning a bit more consistent.

> I wouldn't be averse to omitting the pass-through behavior on the first pass of this - and having warnings-as-errors be implemented entirely in the caller (they see a warning, emit it as an error (so I think the callback should still pass an Error) but provide no return value from the handler to allow it to be transformed into an error - caller can then record they emitted an error and combine that with any error result/failure code coming from the main API)

So when `getSymbolicFile` returns a warning it will do
```
Warn(createFileError(MemberName, ObjOrErr.takeError());
return nullptr;
```
and when it returns an error it will do
```
return ObjOrErr.takeError();
```
?

I'm not sure about that. If we only add the context sometimes it will read like a mistake. Another approach would be to remove the call to `createFileError` from the caller of `getSymbolicFile` and add calls to `createFileError` to `getSymbolicFile` whenever it creates an error or warning. So now warnings and errors look the same:
```
return errorOrValue(Warn(createFileError(MemberName, ObjOrErr.takeError())), nullptr); // warning
```
or
```
return createFileError(MemberName, ObjOrErr.takeError()); // error
```
But that's generally more error prone, it'll be easier to miss adding a call to `createFileError` especially if we follow the pattern elsewhere in the code to return warnings.

https://github.com/llvm/llvm-project/pull/96848