[lld] [llvm] [DTLTO][LLD][ELF] Support bitcode members of thin archives (PR #149425)

via llvm-commits llvm-commits at lists.llvm.org
Wed Jul 23 06:29:51 PDT 2025


https://github.com/bd1976bris updated https://github.com/llvm/llvm-project/pull/149425

>From 5ea21c52f5fd77bb50f8245740a42361e14a7073 Mon Sep 17 00:00:00 2001
From: Dunbobbin <Ben.Dunbobbin at sony.com>
Date: Fri, 18 Jul 2025 00:00:58 +0100
Subject: [PATCH 1/5] [DTLTO][LLD][ELF] Support bitcode members of thin
 archives

This patch adds support for bitcode members of thin archives to
DTLTO (https://llvm.org/docs/DTLTO.html) in ELF LLD.

For DTLTO, bitcode identifiers must be valid paths to bitcode files
on disk. Clang does not support archive inputs for ThinLTO backend
compilations. This patch adjusts the identifier for bitcode members
of thin archives in DTLTO links so that it is the path to the member
file on disk, allowing such members to be supported in DTLTO.

This patch is sufficient to allow for self-hosting an LLVM build
with DTLTO when thin archives are used.

Note: Bitcode members of non-thin archives remain unsupported. This
will be addressed in a future change.

Testing:

- LLD lit test coverage has been added to check that the identifier
  is adjusted appropriately.
- A cross-project lit test has been added to show that a DTLTO link
  can succeed when linking bitcode members of thin archives.

For the design discussion of the DTLTO feature, see: #126654.
---
 cross-project-tests/CMakeLists.txt            |  5 +-
 .../dtlto/ld-archive-thin.test                | 97 +++++++++++++++++++
 cross-project-tests/lit.cfg.py                |  4 +-
 lld/ELF/InputFiles.cpp                        | 42 +++++++-
 lld/test/ELF/dtlto/archive-thin.test          | 66 +++++++++++++
 5 files changed, 207 insertions(+), 7 deletions(-)
 create mode 100644 cross-project-tests/dtlto/ld-archive-thin.test
 create mode 100644 lld/test/ELF/dtlto/archive-thin.test

diff --git a/cross-project-tests/CMakeLists.txt b/cross-project-tests/CMakeLists.txt
index b4b1f47626073..192db87043177 100644
--- a/cross-project-tests/CMakeLists.txt
+++ b/cross-project-tests/CMakeLists.txt
@@ -19,11 +19,12 @@ set(CROSS_PROJECT_TEST_DEPS
   FileCheck
   check-gdb-llvm-support
   count
-  llvm-dwarfdump
+  llvm-ar
   llvm-config
+  llvm-dwarfdump
   llvm-objdump
-  split-file
   not
+  split-file
   )
 
 if ("clang" IN_LIST LLVM_ENABLE_PROJECTS)
diff --git a/cross-project-tests/dtlto/ld-archive-thin.test b/cross-project-tests/dtlto/ld-archive-thin.test
new file mode 100644
index 0000000000000..979da5423962e
--- /dev/null
+++ b/cross-project-tests/dtlto/ld-archive-thin.test
@@ -0,0 +1,97 @@
+REQUIRES: ld.lld,llvm-ar
+
+## Test that a DTLTO link succeeds and outputs the expected set of files
+## correctly when thin archives are present.
+
+RUN: rm -rf %t && split-file %s %t && cd %t
+
+## Compile bitcode. -O2 is required for cross-module importing.
+RUN: %clang -O2 --target=x86_64-linux-gnu -flto=thin -c \
+RUN:   foo.c bar.c dog.c cat.c start.c
+
+## Generate thin archives.
+RUN: llvm-ar rcs foo.a foo.o --thin
+## Create this bitcode thin archive in a subdirectory to test the expansion of
+## the path to a bitcode file that is referenced using "..", e.g., in this case
+## "../bar.o".
+RUN: mkdir lib
+RUN: llvm-ar rcs lib/bar.a bar.o --thin
+## Create this bitcode thin archive with an absolute path entry containing "..".
+RUN: llvm-ar rcs dog.a %t/lib/../dog.o --thin
+## The bitcode member of cat.a will not be used in the link.
+RUN: llvm-ar rcs cat.a cat.o --thin
+RUN: llvm-ar rcs start.a start.o --thin
+
+## Link from a different directory to ensure that thin archive member paths are
+## resolved correctly relative to the archive locations.
+RUN: mkdir %t/out && cd %t/out
+
+RUN: %clang --target=x86_64-linux-gnu -flto=thin -fuse-ld=lld %t/foo.a %t/lib/bar.a ../start.a %t/cat.a \
+RUN:   -Wl,--whole-archive ../dog.a \
+RUN:   -fthinlto-distributor=%python \
+RUN:   -Xthinlto-distributor=%llvm_src_root/utils/dtlto/local.py \
+RUN:   -Wl,--save-temps -nostdlib -Werror
+
+## Check that the required output files have been created.
+RUN: ls | sort | FileCheck %s
+
+## No files are expected before.
+CHECK-NOT: {{.}}
+
+## JSON jobs description.
+CHECK: {{^}}a.[[PID:[a-zA-Z0-9_]+]].dist-file.json{{$}}
+
+## Native output object files and individual summary index files.
+CHECK: {{^}}bar.3.[[PID]].native.o{{$}}
+CHECK: {{^}}bar.3.[[PID]].native.o.thinlto.bc{{$}}
+CHECK: {{^}}dog.1.[[PID]].native.o{{$}}
+CHECK: {{^}}dog.1.[[PID]].native.o.thinlto.bc{{$}}
+CHECK: {{^}}foo.2.[[PID]].native.o{{$}}
+CHECK: {{^}}foo.2.[[PID]].native.o.thinlto.bc{{$}}
+CHECK: {{^}}start.4.[[PID]].native.o{{$}}
+CHECK: {{^}}start.4.[[PID]].native.o.thinlto.bc{{$}}
+
+## No files are expected after.
+CHECK-NOT: {{.}}
+
+
+## It is important that cross-module inlining occurs for this test to show that Clang can
+## successfully load the bitcode file dependencies recorded in the summary indices.
+## Explicitly check that the expected importing has occurred.
+
+RUN: llvm-dis start.4.*.native.o.thinlto.bc -o - | \
+RUN:   FileCheck %s --check-prefixes=FOO,BAR,START
+
+RUN: llvm-dis dog.1.*.native.o.thinlto.bc -o - | \
+RUN:   FileCheck %s --check-prefixes=FOO,BAR,DOG,START
+
+RUN: llvm-dis foo.2.*.native.o.thinlto.bc -o - | \
+RUN:   FileCheck %s --check-prefixes=FOO,BAR,START
+
+RUN: llvm-dis bar.3.*.native.o.thinlto.bc -o - | \
+RUN:   FileCheck %s --check-prefixes=FOO,BAR,START
+
+FOO-DAG:   foo.o
+BAR-DAG:   bar.o
+DOG-DAG:   dog.o
+START-DAG: start.o
+
+
+#--- foo.c
+extern int bar(int), _start(int);
+__attribute__((retain)) int foo(int x) { return x + bar(x) + _start(x); }
+
+#--- bar.c
+extern int foo(int), _start(int);
+__attribute__((retain)) int bar(int x) { return x + foo(x) + _start(x); }
+
+#--- dog.c
+extern int foo(int), bar(int), _start(int);
+__attribute__((retain)) int dog(int x) { return x + foo(x) + bar(x) + _start(x); }
+
+#--- cat.c
+__attribute__((retain)) void cat(int x) {}
+
+#--- start.c
+extern int foo(int), bar(int);
+__attribute__((retain)) int _start(int x) { return x + foo(x) + bar(x); }
diff --git a/cross-project-tests/lit.cfg.py b/cross-project-tests/lit.cfg.py
index b35c643ac898c..ac27753472646 100644
--- a/cross-project-tests/lit.cfg.py
+++ b/cross-project-tests/lit.cfg.py
@@ -19,7 +19,7 @@
 config.test_format = lit.formats.ShTest(not llvm_config.use_lit_shell)
 
 # suffixes: A list of file extensions to treat as test files.
-config.suffixes = [".c", ".cl", ".cpp", ".m"]
+config.suffixes = [".c", ".cl", ".cpp", ".m", ".test"]
 
 # excludes: A list of directories to exclude from the testsuite. The 'Inputs'
 # subdirectories contain auxiliary inputs for various tests in their parent
@@ -107,6 +107,8 @@ def get_required_attr(config, attr_name):
 if lldb_path is not None:
     config.available_features.add("lldb")
 
+if llvm_config.use_llvm_tool("llvm-ar"):
+    config.available_features.add("llvm-ar")
 
 def configure_dexter_substitutions():
     """Configure substitutions for host platform and return list of dependencies"""
diff --git a/lld/ELF/InputFiles.cpp b/lld/ELF/InputFiles.cpp
index 71e72e7184b9f..dedb79bbd5043 100644
--- a/lld/ELF/InputFiles.cpp
+++ b/lld/ELF/InputFiles.cpp
@@ -20,6 +20,7 @@
 #include "llvm/ADT/CachedHashString.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/LTO/LTO.h"
+#include "llvm/Object/Archive.h"
 #include "llvm/Object/IRObjectFile.h"
 #include "llvm/Support/ARMAttributeParser.h"
 #include "llvm/Support/ARMBuildAttributes.h"
@@ -1753,6 +1754,36 @@ static uint8_t getOsAbi(const Triple &t) {
   }
 }
 
+// For DTLTO, bitcode member names must be valid paths to files on disk.
+// For thin archives, resolve `memberPath` relative to the archive's location.
+// Returns true if adjusted; false otherwise. Non-thin archives are unsupported.
+static bool dtltoAdjustMemberPathIfThinArchive(Ctx &ctx, StringRef archivePath,
+                                               std::string &memberPath) {
+  assert(!archivePath.empty() && !ctx.arg.dtltoDistributor.empty());
+
+  // Read the archive header to determine if it's a thin archive.
+  auto bufferOrErr =
+      MemoryBuffer::getFileSlice(archivePath, sizeof(ThinArchiveMagic) - 1, 0);
+  if (std::error_code ec = bufferOrErr.getError()) {
+    ErrAlways(ctx) << "cannot open " << archivePath << ": " << ec.message();
+    return false;
+  }
+
+  if (!bufferOrErr->get()->getBuffer().starts_with(ThinArchiveMagic))
+    return false;
+
+  SmallString<64> resolvedPath;
+  if (path::is_relative(memberPath)) {
+    resolvedPath = path::parent_path(archivePath);
+    path::append(resolvedPath, memberPath);
+  } else
+    resolvedPath = memberPath;
+
+  path::remove_dots(resolvedPath, /*remove_dot_dot=*/true);
+  memberPath = resolvedPath.str();
+  return true;
+}
+
 BitcodeFile::BitcodeFile(Ctx &ctx, MemoryBufferRef mb, StringRef archiveName,
                          uint64_t offsetInArchive, bool lazy)
     : InputFile(ctx, BitcodeKind, mb) {
@@ -1770,10 +1801,13 @@ BitcodeFile::BitcodeFile(Ctx &ctx, MemoryBufferRef mb, StringRef archiveName,
   // symbols later in the link stage). So we append file offset to make
   // filename unique.
   StringSaver &ss = ctx.saver;
-  StringRef name = archiveName.empty()
-                       ? ss.save(path)
-                       : ss.save(archiveName + "(" + path::filename(path) +
-                                 " at " + utostr(offsetInArchive) + ")");
+  StringRef name =
+      (archiveName.empty() ||
+       (!ctx.arg.dtltoDistributor.empty() &&
+        dtltoAdjustMemberPathIfThinArchive(ctx, archiveName, path)))
+          ? ss.save(path)
+          : ss.save(archiveName + "(" + path::filename(path) + " at " +
+                    utostr(offsetInArchive) + ")");
   MemoryBufferRef mbref(mb.getBuffer(), name);
 
   obj = CHECK2(lto::InputFile::create(mbref), this);
diff --git a/lld/test/ELF/dtlto/archive-thin.test b/lld/test/ELF/dtlto/archive-thin.test
new file mode 100644
index 0000000000000..e25d62429b443
--- /dev/null
+++ b/lld/test/ELF/dtlto/archive-thin.test
@@ -0,0 +1,66 @@
+REQUIRES: x86
+
+## Test that a DTLTO link assigns Module IDs to thin archive members as expected.
+
+RUN: rm -rf %t && split-file %s %t && cd %t
+
+RUN: sed 's/@t1/@t2/g' t1.ll > t2.ll
+RUN: sed 's/@t1/@t3/g' t1.ll > t3.ll
+
+RUN: opt -thinlto-bc t1.ll -o t1.bc
+RUN: opt -thinlto-bc t2.ll -o t2.bc
+RUN: opt -thinlto-bc t3.ll -o t3.bc
+
+RUN: llvm-ar rcs t1.a t1.bc --thin
+## Create this bitcode thin archive in a subdirectory to test the expansion of
+## the path to a bitcode file that is referenced using "..", e.g., in this case
+## "../t2.bc".
+RUN: mkdir lib
+RUN: llvm-ar rcs lib/t2.a t2.bc --thin
+## Create this bitcode thin archive with an absolute path entry containing "..".
+RUN: llvm-ar rcs t3.a %t/lib/../t3.bc --thin
+
+## Link from a different directory to ensure that thin archive member paths are
+## resolved correctly relative to the archive locations.
+RUN: mkdir %t/out && cd %t/out
+
+## Build a response file to share common linking arguments.
+## Note: validate.py does not perform any compilation. Instead, it validates the
+## received JSON, pretty-prints the JSON and the supplied arguments, and then
+## exits with an error. This allows FileCheck directives to verify the
+## distributor inputs.
+RUN: echo "%t/t1.a %t/lib/t2.a ../t3.a \
+RUN:   --thinlto-distributor=\"%python\" \
+RUN:   --thinlto-distributor-arg=\"%llvm_src_root/utils/dtlto/validate.py\" " > rsp
+
+## Link thin archives using -u/--undefined.
+RUN: not ld.lld @rsp -u t1 -u t2 -u t3 2>&1 | FileCheck %s
+
+## Link thin archives using --whole-archive.
+RUN: not ld.lld --whole-archive @rsp 2>&1 | FileCheck %s
+
+## Check the module IDs in the JSON jobs description.
+CHECK: "jobs": [
+CHECK: "inputs": [
+CHECK-NEXT: "{{([a-zA-Z]:)|/}}
+CHECK-SAME: {{/|\\\\}}archive-thin.test.tmp{{/|\\\\}}t1.bc"
+
+CHECK: "inputs": [
+CHECK-NEXT: "{{([a-zA-Z]\:)|/}}
+CHECK-SAME: {{/|\\\\}}archive-thin.test.tmp{{/|\\\\}}t2.bc"
+
+CHECK: "inputs": [
+CHECK-NEXT: "{{([a-zA-Z]:)|/}}
+CHECK-SAME: {{/|\\\\}}archive-thin.test.tmp{{/|\\\\}}t3.bc"
+
+## Ensure backend compilation fails as expected (due to validate.py dummy behavior).
+CHECK: error: DTLTO backend compilation: cannot open native object file:
+
+#--- t1.ll
+target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+define void @t1() {
+  ret void
+}
+

>From b549db36b344af1ecff62287668b67886bb353e4 Mon Sep 17 00:00:00 2001
From: Dunbobbin <Ben.Dunbobbin at sony.com>
Date: Wed, 23 Jul 2025 14:04:17 +0100
Subject: [PATCH 2/5] Simplify complicated logic in BitcodeFile constructor

---
 lld/ELF/InputFiles.cpp | 33 +++++++++++++++++++--------------
 1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/lld/ELF/InputFiles.cpp b/lld/ELF/InputFiles.cpp
index dedb79bbd5043..e581973d568c5 100644
--- a/lld/ELF/InputFiles.cpp
+++ b/lld/ELF/InputFiles.cpp
@@ -1759,7 +1759,10 @@ static uint8_t getOsAbi(const Triple &t) {
 // Returns true if adjusted; false otherwise. Non-thin archives are unsupported.
 static bool dtltoAdjustMemberPathIfThinArchive(Ctx &ctx, StringRef archivePath,
                                                std::string &memberPath) {
-  assert(!archivePath.empty() && !ctx.arg.dtltoDistributor.empty());
+  assert(!archivePath.empty());
+
+  if (ctx.arg.dtltoDistributor.empty())
+    return false;
 
   // Read the archive header to determine if it's a thin archive.
   auto bufferOrErr =
@@ -1794,20 +1797,22 @@ BitcodeFile::BitcodeFile(Ctx &ctx, MemoryBufferRef mb, StringRef archiveName,
   if (ctx.arg.thinLTOIndexOnly)
     path = replaceThinLTOSuffix(ctx, mb.getBufferIdentifier());
 
-  // ThinLTO assumes that all MemoryBufferRefs given to it have a unique
-  // name. If two archives define two members with the same name, this
-  // causes a collision which result in only one of the objects being taken
-  // into consideration at LTO time (which very likely causes undefined
-  // symbols later in the link stage). So we append file offset to make
-  // filename unique.
   StringSaver &ss = ctx.saver;
-  StringRef name =
-      (archiveName.empty() ||
-       (!ctx.arg.dtltoDistributor.empty() &&
-        dtltoAdjustMemberPathIfThinArchive(ctx, archiveName, path)))
-          ? ss.save(path)
-          : ss.save(archiveName + "(" + path::filename(path) + " at " +
-                    utostr(offsetInArchive) + ")");
+  StringRef name;
+  if (archiveName.empty() ||
+      dtltoAdjustMemberPathIfThinArchive(ctx, archiveName, path)) {
+    name = ss.save(path);
+  } else {
+    // ThinLTO assumes that all MemoryBufferRefs given to it have a unique
+    // name. If two archives define two members with the same name, this
+    // causes a collision which result in only one of the objects being taken
+    // into consideration at LTO time (which very likely causes undefined
+    // symbols later in the link stage). So we append file offset to make
+    // filename unique.
+    name = ss.save(archiveName + "(" + path::filename(path) + " at " +
+                   utostr(offsetInArchive) + ")");
+  }
+
   MemoryBufferRef mbref(mb.getBuffer(), name);
 
   obj = CHECK2(lto::InputFile::create(mbref), this);

>From 4337a5fff59cce6bfd5155cc45a9d66a66e751f7 Mon Sep 17 00:00:00 2001
From: Dunbobbin <Ben.Dunbobbin at sony.com>
Date: Wed, 23 Jul 2025 14:06:06 +0100
Subject: [PATCH 3/5] Use more standard SmallString<128> rather than
 SmallString<64>

---
 lld/ELF/InputFiles.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lld/ELF/InputFiles.cpp b/lld/ELF/InputFiles.cpp
index e581973d568c5..62b54f8217bc3 100644
--- a/lld/ELF/InputFiles.cpp
+++ b/lld/ELF/InputFiles.cpp
@@ -1775,7 +1775,7 @@ static bool dtltoAdjustMemberPathIfThinArchive(Ctx &ctx, StringRef archivePath,
   if (!bufferOrErr->get()->getBuffer().starts_with(ThinArchiveMagic))
     return false;
 
-  SmallString<64> resolvedPath;
+  SmallString<128> resolvedPath;
   if (path::is_relative(memberPath)) {
     resolvedPath = path::parent_path(archivePath);
     path::append(resolvedPath, memberPath);

>From 1ee83e23a63a44eb6b44af1e50c55e6ff097f51b Mon Sep 17 00:00:00 2001
From: Dunbobbin <Ben.Dunbobbin at sony.com>
Date: Wed, 23 Jul 2025 14:23:58 +0100
Subject: [PATCH 4/5] Use single quotes so that the inner double quotes do not
 need to be escaped

---
 lld/test/ELF/dtlto/archive-thin.test | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lld/test/ELF/dtlto/archive-thin.test b/lld/test/ELF/dtlto/archive-thin.test
index e25d62429b443..6534da2e4aa5a 100644
--- a/lld/test/ELF/dtlto/archive-thin.test
+++ b/lld/test/ELF/dtlto/archive-thin.test
@@ -29,9 +29,9 @@ RUN: mkdir %t/out && cd %t/out
 ## received JSON, pretty-prints the JSON and the supplied arguments, and then
 ## exits with an error. This allows FileCheck directives to verify the
 ## distributor inputs.
-RUN: echo "%t/t1.a %t/lib/t2.a ../t3.a \
-RUN:   --thinlto-distributor=\"%python\" \
-RUN:   --thinlto-distributor-arg=\"%llvm_src_root/utils/dtlto/validate.py\" " > rsp
+RUN: echo '%t/t1.a %t/lib/t2.a ../t3.a \
+RUN:   --thinlto-distributor="%python" \
+RUN:   --thinlto-distributor-arg="%llvm_src_root/utils/dtlto/validate.py"' > rsp
 
 ## Link thin archives using -u/--undefined.
 RUN: not ld.lld @rsp -u t1 -u t2 -u t3 2>&1 | FileCheck %s

>From cf4c7da8fdefbd1607712e3c2e69ea8eb0933241 Mon Sep 17 00:00:00 2001
From: Dunbobbin <Ben.Dunbobbin at sony.com>
Date: Wed, 23 Jul 2025 14:27:56 +0100
Subject: [PATCH 5/5] Remove trailing blank lines from test

---
 lld/test/ELF/dtlto/archive-thin.test | 1 -
 1 file changed, 1 deletion(-)

diff --git a/lld/test/ELF/dtlto/archive-thin.test b/lld/test/ELF/dtlto/archive-thin.test
index 6534da2e4aa5a..bcd5f138459b0 100644
--- a/lld/test/ELF/dtlto/archive-thin.test
+++ b/lld/test/ELF/dtlto/archive-thin.test
@@ -63,4 +63,3 @@ target triple = "x86_64-unknown-linux-gnu"
 define void @t1() {
   ret void
 }
-



More information about the llvm-commits mailing list