[clang] b2f7b5d - [CodeGen] Support bitcode input containing multiple modules

Fangrui Song via cfe-commits cfe-commits at lists.llvm.org
Fri Jul 21 20:05:40 PDT 2023


Author: Fangrui Song
Date: 2023-07-21T20:05:35-07:00
New Revision: b2f7b5dbaefe4f2e3f8f279735ea3509a796693f

URL: https://github.com/llvm/llvm-project/commit/b2f7b5dbaefe4f2e3f8f279735ea3509a796693f
DIFF: https://github.com/llvm/llvm-project/commit/b2f7b5dbaefe4f2e3f8f279735ea3509a796693f.diff

LOG: [CodeGen] Support bitcode input containing multiple modules

When using -fsplit-lto-unit (explicitly specified or due to using
-fsanitize=cfi/-fwhole-program-vtables), the emitted LLVM IR contains a module
flag metadata `"EnableSplitLTOUnit"`. If a module contains both type metadata
and `"EnableSplitLTOUnit"`, `ThinLTOBitcodeWriter.cpp` will write two modules
into the bitcode file. Compiling the bitcode (not ThinLTO backend compilation)
will lead to an error due to `parseIR` requiring a single module.

```
% clang -flto=thin a.cc -c -o a.bc
% clang -c a.bc
% clang -fsplit-lto-unit -flto=thin a.cc -c -o a.bc
% clang -c a.bc
error: Expected a single module
1 error generated.
```

There are multiple ways to have just one module in a bitcode file
output: `-Xclang -fno-lto-unit`, not using features like `-fsanitize=cfi`,
using `-fsanitize=cfi` with `-fno-split-lto-unit`. I think whether a
bitcode input file contains 2 modules (internal implementation strategy)
should not be a criterion to require an additional driver option when
the user seek for a non-LTO compile action.

Let's place the extra module (if present) into CodeGenOptions::LinkBitcodeFiles
(originally for -cc1 -mlink-bitcode-file). Linker::linkModules will link the two
modules together. This patch makes the following commands work:

```
clang -S -emit-llvm a.bc
clang -S a.bc
clang -c a.bc
```

Reviewed By: ormris

Differential Revision: https://reviews.llvm.org/D154923

Added: 
    clang/test/CodeGen/split-lto-unit-input.cpp

Modified: 
    clang/lib/CodeGen/CodeGenAction.cpp

Removed: 
    


################################################################################
diff  --git a/clang/lib/CodeGen/CodeGenAction.cpp b/clang/lib/CodeGen/CodeGenAction.cpp
index 4879bcd6a42a52..a3b72381d73fc5 100644
--- a/clang/lib/CodeGen/CodeGenAction.cpp
+++ b/clang/lib/CodeGen/CodeGenAction.cpp
@@ -1112,21 +1112,21 @@ CodeGenAction::loadModule(MemoryBufferRef MBRef) {
   CompilerInstance &CI = getCompilerInstance();
   SourceManager &SM = CI.getSourceManager();
 
+  auto DiagErrors = [&](Error E) -> std::unique_ptr<llvm::Module> {
+    unsigned DiagID =
+        CI.getDiagnostics().getCustomDiagID(DiagnosticsEngine::Error, "%0");
+    handleAllErrors(std::move(E), [&](ErrorInfoBase &EIB) {
+      CI.getDiagnostics().Report(DiagID) << EIB.message();
+    });
+    return {};
+  };
+
   // For ThinLTO backend invocations, ensure that the context
   // merges types based on ODR identifiers. We also need to read
   // the correct module out of a multi-module bitcode file.
   if (!CI.getCodeGenOpts().ThinLTOIndexFile.empty()) {
     VMContext->enableDebugTypeODRUniquing();
 
-    auto DiagErrors = [&](Error E) -> std::unique_ptr<llvm::Module> {
-      unsigned DiagID =
-          CI.getDiagnostics().getCustomDiagID(DiagnosticsEngine::Error, "%0");
-      handleAllErrors(std::move(E), [&](ErrorInfoBase &EIB) {
-        CI.getDiagnostics().Report(DiagID) << EIB.message();
-      });
-      return {};
-    };
-
     Expected<std::vector<BitcodeModule>> BMsOrErr = getBitcodeModuleList(MBRef);
     if (!BMsOrErr)
       return DiagErrors(BMsOrErr.takeError());
@@ -1151,10 +1151,35 @@ CodeGenAction::loadModule(MemoryBufferRef MBRef) {
   if (loadLinkModules(CI))
     return nullptr;
 
+  // Handle textual IR and bitcode file with one single module.
   llvm::SMDiagnostic Err;
   if (std::unique_ptr<llvm::Module> M = parseIR(MBRef, Err, *VMContext))
     return M;
 
+  // If MBRef is a bitcode with multiple modules (e.g., -fsplit-lto-unit
+  // output), place the extra modules (actually only one, a regular LTO module)
+  // into LinkModules as if we are using -mlink-bitcode-file.
+  Expected<std::vector<BitcodeModule>> BMsOrErr = getBitcodeModuleList(MBRef);
+  if (BMsOrErr && BMsOrErr->size()) {
+    std::unique_ptr<llvm::Module> FirstM;
+    for (auto &BM : *BMsOrErr) {
+      Expected<std::unique_ptr<llvm::Module>> MOrErr =
+          BM.parseModule(*VMContext);
+      if (!MOrErr)
+        return DiagErrors(MOrErr.takeError());
+      if (FirstM)
+        LinkModules.push_back({std::move(*MOrErr), /*PropagateAttrs=*/false,
+                               /*Internalize=*/false, /*LinkFlags=*/{}});
+      else
+        FirstM = std::move(*MOrErr);
+    }
+    if (FirstM)
+      return FirstM;
+  }
+  // If BMsOrErr fails, consume the error and use the error message from
+  // parseIR.
+  consumeError(BMsOrErr.takeError());
+
   // Translate from the diagnostic info to the SourceManager location if
   // available.
   // TODO: Unify this with ConvertBackendLocation()

diff  --git a/clang/test/CodeGen/split-lto-unit-input.cpp b/clang/test/CodeGen/split-lto-unit-input.cpp
new file mode 100644
index 00000000000000..adfc9ac3e0f446
--- /dev/null
+++ b/clang/test/CodeGen/split-lto-unit-input.cpp
@@ -0,0 +1,32 @@
+// REQUIRES: x86-registered-target
+/// When the input is a -fsplit-lto-unit bitcode file, link the regular LTO file like -mlink-bitcode-file.
+// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm-bc -flto=thin -flto-unit -fsplit-lto-unit %s -o %t.bc
+// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-obj %t.bc -o %t.o
+// RUN: llvm-nm %t.o | FileCheck %s
+// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %t.bc -o - | FileCheck %s --check-prefix=CHECK-IR
+
+// CHECK:      V _ZTI1A
+// CHECK-NEXT: V _ZTI1B
+// CHECK-NEXT: V _ZTS1A
+// CHECK-NEXT: V _ZTS1B
+// CHECK-NEXT: V _ZTV1A
+// CHECK-NEXT: V _ZTV1B
+
+// CHECK-IR-DAG: _ZTS1B = linkonce_odr constant
+// CHECK-IR-DAG: _ZTS1A = linkonce_odr constant
+// CHECK-IR-DAG: _ZTV1B = linkonce_odr unnamed_addr constant
+// CHECK-IR-DAG: _ZTI1A = linkonce_odr constant
+// CHECK-IR-DAG: _ZTI1B = linkonce_odr constant
+// CHECK-IR-DAG: _ZTV1A = linkonce_odr unnamed_addr constant
+
+struct A {
+  virtual int c(int i) = 0;
+};
+
+struct B : A {
+  virtual int c(int i) { return i; }
+};
+
+int use() {
+  return (new B)->c(0);
+}


        


More information about the cfe-commits mailing list