[lld] [llvm] [ThinLTO] Add module names to ThinLTO final objects (PR #74160)

via llvm-commits llvm-commits at lists.llvm.org
Fri Dec 1 15:56:39 PST 2023


https://github.com/xur-llvm created https://github.com/llvm/llvm-project/pull/74160

Emit the module name for final ThinLTO objects under
--lto-output-module-name. This will add the module name as part of the native objects or asm files for
 -Wl,--lto-emit-asm, -Wl,--lto-obj-path=<path>, and -Wl,--save-temps=prelink
    
Also fix the bug where these options won't work with thinlto-cache is used.

>From 62728756fd56e3427376268c4178765950a27636 Mon Sep 17 00:00:00 2001
From: Rong Xu <xur at google.com>
Date: Wed, 18 Oct 2023 15:13:15 -0700
Subject: [PATCH 1/2] [PGO] Sampled instrumentation in PGO to speed up
 instrumentation binary

PGO instrumentation binary can be very slow comparing to the
non-instrumented binary. It's not uncommon to see 10x slowdown
for highly threaded programs, due to data race of false sharing
to the counters.

This patch uses sampling in PGO instrumentation to speed up
instrumentation binary. The basic idea is the same as one:
here: https://reviews.llvm.org/D63949

This patch makes some improvements so that we only use one
condition. We now fix the WholeDuring at 65536 and use the
wraparound of unsigned short.

With this sampled instrumentation, the binary runs much
faster. We measure 5x speedup using the default duration.
We now only see about 20% to 30% slow down (comparing to
8 to 10x slowdown without sampling).

The profile quality is pretty good with sampling: the edge
counts usually report >90% overlap.

For the apps that program behaviors change due to binary
speed, sampling instrumentation can improve the performance.
We have observed some apps getting up ~2% improvement in PGO.

One potential issue of this patch is the increased binary
size and compilation time.
---
 .../llvm/ProfileData/InstrProfData.inc        |   1 +
 .../include/llvm/Transforms/Instrumentation.h |   6 +
 .../Instrumentation/InstrProfiling.h          |   6 +
 .../Instrumentation/PGOInstrumentation.h      |   6 +-
 llvm/lib/Passes/PassBuilderPipelines.cpp      |  10 +-
 .../Instrumentation/InstrProfiling.cpp        | 131 ++++++++++++++++--
 .../Instrumentation/PGOInstrumentation.cpp    |   2 +
 .../PGOProfile/Inputs/cspgo_bar_sample.ll     |  82 +++++++++++
 .../PGOProfile/counter_promo_sampling.ll      |  78 +++++++++++
 .../Transforms/PGOProfile/cspgo_sample.ll     | 112 +++++++++++++++
 .../Transforms/PGOProfile/instrprof_sample.ll |  47 +++++++
 11 files changed, 463 insertions(+), 18 deletions(-)
 create mode 100644 llvm/test/Transforms/PGOProfile/Inputs/cspgo_bar_sample.ll
 create mode 100644 llvm/test/Transforms/PGOProfile/counter_promo_sampling.ll
 create mode 100644 llvm/test/Transforms/PGOProfile/cspgo_sample.ll
 create mode 100644 llvm/test/Transforms/PGOProfile/instrprof_sample.ll

diff --git a/llvm/include/llvm/ProfileData/InstrProfData.inc b/llvm/include/llvm/ProfileData/InstrProfData.inc
index 13be2753e514efe..6294505ac396856 100644
--- a/llvm/include/llvm/ProfileData/InstrProfData.inc
+++ b/llvm/include/llvm/ProfileData/InstrProfData.inc
@@ -676,6 +676,7 @@ serializeValueProfDataFrom(ValueProfRecordClosure *Closure,
 #define INSTR_PROF_PROFILE_RUNTIME_VAR __llvm_profile_runtime
 #define INSTR_PROF_PROFILE_COUNTER_BIAS_VAR __llvm_profile_counter_bias
 #define INSTR_PROF_PROFILE_SET_TIMESTAMP __llvm_profile_set_timestamp
+#define INSTR_PROF_PROFILE_SAMPLING_VAR __llvm_profile_sampling
 
 /* The variable that holds the name of the profile data
  * specified via command line. */
diff --git a/llvm/include/llvm/Transforms/Instrumentation.h b/llvm/include/llvm/Transforms/Instrumentation.h
index 392983a19844451..76d4e1de75154ff 100644
--- a/llvm/include/llvm/Transforms/Instrumentation.h
+++ b/llvm/include/llvm/Transforms/Instrumentation.h
@@ -116,12 +116,18 @@ struct InstrProfOptions {
   // Use BFI to guide register promotion
   bool UseBFIInPromotion = false;
 
+  // Use sampling to reduce the profile instrumentation runtime overhead.
+  bool Sampling = false;
+
   // Name of the profile file to use as output
   std::string InstrProfileOutput;
 
   InstrProfOptions() = default;
 };
 
+// Create the variable for profile sampling.
+void createProfileSamplingVar(Module &M);
+
 // Options for sanitizer coverage instrumentation.
 struct SanitizerCoverageOptions {
   enum Type {
diff --git a/llvm/include/llvm/Transforms/Instrumentation/InstrProfiling.h b/llvm/include/llvm/Transforms/Instrumentation/InstrProfiling.h
index cb0c055dcb74ae8..d0581ff72a15864 100644
--- a/llvm/include/llvm/Transforms/Instrumentation/InstrProfiling.h
+++ b/llvm/include/llvm/Transforms/Instrumentation/InstrProfiling.h
@@ -86,6 +86,9 @@ class InstrProfiling : public PassInfoMixin<InstrProfiling> {
   /// Returns true if profile counter update register promotion is enabled.
   bool isCounterPromotionEnabled() const;
 
+  /// Return true if profile sampling is enabled.
+  bool isSamplingEnabled() const;
+
   /// Count the number of instrumented value sites for the function.
   void computeNumValueSiteCounts(InstrProfValueProfileInst *Ins);
 
@@ -109,6 +112,9 @@ class InstrProfiling : public PassInfoMixin<InstrProfiling> {
   /// acts on.
   Value *getCounterAddress(InstrProfInstBase *I);
 
+  /// Lower the incremental instructions under profile sampling predicates.
+  void doSampling(Instruction *I);
+
   /// Get the region counters for an increment, creating them if necessary.
   ///
   /// If the counter array doesn't yet exist, the profile data variables
diff --git a/llvm/include/llvm/Transforms/Instrumentation/PGOInstrumentation.h b/llvm/include/llvm/Transforms/Instrumentation/PGOInstrumentation.h
index 5b1977b7de9a2ae..7199f27dbc991a8 100644
--- a/llvm/include/llvm/Transforms/Instrumentation/PGOInstrumentation.h
+++ b/llvm/include/llvm/Transforms/Instrumentation/PGOInstrumentation.h
@@ -43,12 +43,14 @@ class FileSystem;
 class PGOInstrumentationGenCreateVar
     : public PassInfoMixin<PGOInstrumentationGenCreateVar> {
 public:
-  PGOInstrumentationGenCreateVar(std::string CSInstrName = "")
-      : CSInstrName(CSInstrName) {}
+  PGOInstrumentationGenCreateVar(std::string CSInstrName = "",
+                                 bool Sampling = false)
+      : CSInstrName(CSInstrName), ProfileSampling(Sampling) {}
   PreservedAnalyses run(Module &M, ModuleAnalysisManager &MAM);
 
 private:
   std::string CSInstrName;
+  bool ProfileSampling;
 };
 
 /// The instrumentation (profile-instr-gen) pass for IR based PGO.
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 600f8d43caaf216..5595f92e24aa861 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -273,6 +273,9 @@ static cl::opt<AttributorRunOption> AttributorRun(
                clEnumValN(AttributorRunOption::NONE, "none",
                           "disable attributor runs")));
 
+static cl::opt<bool> EnableSampledInstr(
+    "enable-sampled-instr", cl::init(false), cl::Hidden,
+    cl::desc("Enable profile instrumentation sampling (default = off)"));
 static cl::opt<bool> UseLoopVersioningLICM(
     "enable-loop-versioning-licm", cl::init(false), cl::Hidden,
     cl::desc("Enable the experimental Loop Versioning LICM pass"));
@@ -805,6 +808,10 @@ void PassBuilder::addPGOInstrPasses(ModulePassManager &MPM,
   // Do counter promotion at Level greater than O0.
   Options.DoCounterPromotion = true;
   Options.UseBFIInPromotion = IsCS;
+  if (EnableSampledInstr) {
+    Options.Sampling = true;
+    Options.DoCounterPromotion = false;
+  }
   Options.Atomic = AtomicCounterUpdate;
   MPM.addPass(InstrProfiling(Options, IsCS));
 }
@@ -1117,7 +1124,8 @@ PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
   }
   if (PGOOpt && Phase != ThinOrFullLTOPhase::ThinLTOPostLink &&
       PGOOpt->CSAction == PGOOptions::CSIRInstr)
-    MPM.addPass(PGOInstrumentationGenCreateVar(PGOOpt->CSProfileGenFile));
+    MPM.addPass(PGOInstrumentationGenCreateVar(PGOOpt->CSProfileGenFile,
+                                               EnableSampledInstr));
 
   if (PGOOpt && Phase != ThinOrFullLTOPhase::ThinLTOPostLink &&
       !PGOOpt->MemoryProfile.empty())
diff --git a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
index 57fcfd53836911b..89e8e152fcee7e4 100644
--- a/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
+++ b/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp
@@ -36,6 +36,7 @@
 #include "llvm/IR/Instruction.h"
 #include "llvm/IR/Instructions.h"
 #include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/MDBuilder.h"
 #include "llvm/IR/Module.h"
 #include "llvm/IR/Type.h"
 #include "llvm/InitializePasses.h"
@@ -48,6 +49,7 @@
 #include "llvm/Support/ErrorHandling.h"
 #include "llvm/TargetParser/Triple.h"
 #include "llvm/Transforms/Instrumentation/PGOInstrumentation.h"
+#include "llvm/Transforms/Utils/BasicBlockUtils.h"
 #include "llvm/Transforms/Utils/ModuleUtils.h"
 #include "llvm/Transforms/Utils/SSAUpdater.h"
 #include <algorithm>
@@ -148,6 +150,16 @@ cl::opt<bool> SkipRetExitBlock(
     "skip-ret-exit-block", cl::init(true),
     cl::desc("Suppress counter promotion if exit blocks contain ret."));
 
+static cl::opt<bool>
+    SampledInstrument("sampled-instr", cl::ZeroOrMore, cl::init(false),
+                      cl::desc("Do PGO instrumentation sampling"));
+
+static cl::opt<unsigned> SampledInstrumentDuration(
+    "sampled-instr-duration",
+    cl::desc("Set the sample rate for profile instrumentation, with a value "
+             "range 0 to 65535. We will record this number of samples for "
+             "every 65536 count updates"),
+    cl::init(200));
 ///
 /// A helper class to promote one counter RMW operation in the loop
 /// into register update.
@@ -412,30 +424,91 @@ PreservedAnalyses InstrProfiling::run(Module &M, ModuleAnalysisManager &AM) {
   return PreservedAnalyses::none();
 }
 
+// Perform instrumentation sampling.
+// We transform:
+//   Increment_Instruction;
+// to:
+//   if (__llvm_profile_sampling__ <= SampleDuration) {
+//     Increment_Instruction;
+//   }
+//   __llvm_profile_sampling__ += 1;
+//
+// "__llvm_profile_sampling__" is a thread-local global shared by all PGO
+// instrumentation variables (value-instrumentation and edge instrumentation).
+// It has a unsigned short type and will wrapper around when overflow.
+//
+// Note that, the code snippet after the transformation can still be
+// counter promoted. But I don't see a reason for that because the
+// counter updated should be sparse. That's the reason we disable
+// counter promotion by default when sampling is enabled.
+// This can be overwritten by the internal option.
+//
+void InstrProfiling::doSampling(Instruction *I) {
+  if (!isSamplingEnabled())
+    return;
+  int SampleDuration = SampledInstrumentDuration.getValue();
+  unsigned WrapToZeroValue = USHRT_MAX + 1;
+  assert(SampleDuration < USHRT_MAX);
+  auto *Int16Ty = Type::getInt16Ty(M->getContext());
+  auto *CountVar =
+      M->getGlobalVariable(INSTR_PROF_QUOTE(INSTR_PROF_PROFILE_SAMPLING_VAR));
+  assert(CountVar && "CountVar not set properly");
+  IRBuilder<> CondBuilder(I);
+  auto *LoadCountVar = CondBuilder.CreateLoad(Int16Ty, CountVar);
+  auto *DurationCond = CondBuilder.CreateICmpULE(
+      LoadCountVar, CondBuilder.getInt16(SampleDuration));
+  MDBuilder MDB(I->getContext());
+  MDNode *BranchWeight =
+      MDB.createBranchWeights(SampleDuration, WrapToZeroValue - SampleDuration);
+  Instruction *ThenTerm = SplitBlockAndInsertIfThen(
+      DurationCond, I, /* Unreacheable */ false, BranchWeight);
+  IRBuilder<> IncBuilder(I);
+  auto *NewVal = IncBuilder.CreateAdd(LoadCountVar, IncBuilder.getInt16(1));
+  IncBuilder.CreateStore(NewVal, CountVar);
+  I->moveBefore(ThenTerm);
+}
+
 bool InstrProfiling::lowerIntrinsics(Function *F) {
   bool MadeChange = false;
   PromotionCandidates.clear();
+  SmallVector<InstrProfInstBase *, 8> InstrProfInsts;
+
   for (BasicBlock &BB : *F) {
     for (Instruction &Instr : llvm::make_early_inc_range(BB)) {
-      if (auto *IPIS = dyn_cast<InstrProfIncrementInstStep>(&Instr)) {
-        lowerIncrement(IPIS);
-        MadeChange = true;
-      } else if (auto *IPI = dyn_cast<InstrProfIncrementInst>(&Instr)) {
-        lowerIncrement(IPI);
-        MadeChange = true;
-      } else if (auto *IPC = dyn_cast<InstrProfTimestampInst>(&Instr)) {
-        lowerTimestamp(IPC);
-        MadeChange = true;
-      } else if (auto *IPC = dyn_cast<InstrProfCoverInst>(&Instr)) {
-        lowerCover(IPC);
-        MadeChange = true;
-      } else if (auto *IPVP = dyn_cast<InstrProfValueProfileInst>(&Instr)) {
-        lowerValueProfileInst(IPVP);
-        MadeChange = true;
+      if (auto *IP = dyn_cast<InstrProfInstBase>(&Instr)) {
+        InstrProfInsts.push_back(IP);
       }
     }
   }
 
+  for (auto *IP : InstrProfInsts) {
+    if (auto *IPIS = dyn_cast<InstrProfIncrementInstStep>(IP)) {
+      doSampling(IP);
+      lowerIncrement(IPIS);
+      MadeChange = true;
+    } else if (auto *IPI = dyn_cast<InstrProfIncrementInst>(IP)) {
+      doSampling(IP);
+      lowerIncrement(IPI);
+      MadeChange = true;
+    } else if (auto *IPC = dyn_cast<InstrProfTimestampInst>(IP)) {
+      doSampling(IP);
+      lowerTimestamp(IPC);
+      MadeChange = true;
+    } else if (auto *IPC = dyn_cast<InstrProfCoverInst>(IP)) {
+      doSampling(IP);
+      lowerCover(IPC);
+      MadeChange = true;
+    } else if (auto *IPVP = dyn_cast<InstrProfValueProfileInst>(IP)) {
+      doSampling(IP);
+      lowerValueProfileInst(IPVP);
+      MadeChange = true;
+    } else {
+      LLVM_DEBUG(dbgs() << "Invalid InstroProf intrinsic: " << *IP << "\n");
+      // ?? Seeing "call void @llvm.memcpy.p0.p0.i64..." here ??
+      // llvm_unreachable("Invalid InstroProf intrinsic");
+    }
+  }
+
   if (!MadeChange)
     return false;
 
@@ -455,6 +528,12 @@ bool InstrProfiling::isRuntimeCounterRelocationEnabled() const {
   return TT.isOSFuchsia();
 }
 
+bool InstrProfiling::isSamplingEnabled() const {
+  if (SampledInstrument.getNumOccurrences() > 0)
+    return SampledInstrument;
+  return Options.Sampling;
+}
+
 bool InstrProfiling::isCounterPromotionEnabled() const {
   if (DoCounterPromotion.getNumOccurrences() > 0)
     return DoCounterPromotion;
@@ -535,6 +614,9 @@ bool InstrProfiling::run(
   if (NeedsRuntimeHook)
     MadeChange = emitRuntimeHook();
 
+  if (!IsCS && isSamplingEnabled())
+    createProfileSamplingVar(M);
+
   bool ContainsProfiling = containsProfilingIntrinsics(M);
   GlobalVariable *CoverageNamesVar =
       M.getNamedGlobal(getCoverageUnusedNamesVarName());
@@ -1372,3 +1454,22 @@ void InstrProfiling::emitInitialization() {
 
   appendToGlobalCtors(*M, F, 0);
 }
+
+namespace llvm {
+// Create the variable for profile sampling.
+void createProfileSamplingVar(Module &M) {
+  const StringRef VarName(INSTR_PROF_QUOTE(INSTR_PROF_PROFILE_SAMPLING_VAR));
+  Type *IntTy16 = Type::getInt16Ty(M.getContext());
+  auto SamplingVar = new GlobalVariable(
+      M, IntTy16, false, GlobalValue::WeakAnyLinkage,
+      Constant::getIntegerValue(IntTy16, APInt(16, 0)), VarName);
+  SamplingVar->setVisibility(GlobalValue::DefaultVisibility);
+  SamplingVar->setThreadLocal(true);
+  Triple TT(M.getTargetTriple());
+  if (TT.supportsCOMDAT()) {
+    SamplingVar->setLinkage(GlobalValue::ExternalLinkage);
+    SamplingVar->setComdat(M.getOrInsertComdat(VarName));
+  }
+  appendToCompilerUsed(M, SamplingVar);
+}
+} // namespace llvm
diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
index 7ad1c9bc54f3780..0ea6398fbdedc1f 100644
--- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
+++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
@@ -1820,6 +1820,8 @@ PGOInstrumentationGenCreateVar::run(Module &M, ModuleAnalysisManager &MAM) {
   // The variable in a comdat may be discarded by LTO. Ensure the declaration
   // will be retained.
   appendToCompilerUsed(M, createIRLevelProfileFlagVar(M, /*IsCS=*/true));
+  if (ProfileSampling)
+    createProfileSamplingVar(M);
   PreservedAnalyses PA;
   PA.preserve<FunctionAnalysisManagerModuleProxy>();
   PA.preserveSet<AllAnalysesOn<Function>>();
diff --git a/llvm/test/Transforms/PGOProfile/Inputs/cspgo_bar_sample.ll b/llvm/test/Transforms/PGOProfile/Inputs/cspgo_bar_sample.ll
new file mode 100644
index 000000000000000..1c8be82715f2531
--- /dev/null
+++ b/llvm/test/Transforms/PGOProfile/Inputs/cspgo_bar_sample.ll
@@ -0,0 +1,82 @@
+target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+$__llvm_profile_filename = comdat any
+$__llvm_profile_raw_version = comdat any
+$__llvm_profile_sampling = comdat any
+
+ at odd = common dso_local local_unnamed_addr global i32 0, align 4
+ at even = common dso_local local_unnamed_addr global i32 0, align 4
+ at __llvm_profile_filename = local_unnamed_addr constant [25 x i8] c"pass2/default_%m.profraw\00", comdat
+ at __llvm_profile_raw_version = local_unnamed_addr constant i64 216172782113783812, comdat
+ at __llvm_profile_sampling = thread_local global i16 0, comdat
+ at llvm.used = appending global [1 x i8*] [i8* bitcast (i64* @__llvm_profile_sampling to i8*)], section "llvm.metadata"
+
+define dso_local void @bar(i32 %n) !prof !30 {
+entry:
+  %call = tail call fastcc i32 @cond(i32 %n)
+  %tobool = icmp eq i32 %call, 0
+  br i1 %tobool, label %if.else, label %if.then, !prof !31
+
+if.then:
+  %0 = load i32, i32* @odd, align 4, !tbaa !32
+  %inc = add i32 %0, 1
+  store i32 %inc, i32* @odd, align 4, !tbaa !32
+  br label %if.end
+
+if.else:
+  %1 = load i32, i32* @even, align 4, !tbaa !32
+  %inc1 = add i32 %1, 1
+  store i32 %inc1, i32* @even, align 4, !tbaa !32
+  br label %if.end
+
+if.end:
+  ret void
+}
+
+define internal fastcc i32 @cond(i32 %i) #1 !prof !30 !PGOFuncName !36 {
+entry:
+  %rem = srem i32 %i, 2
+  ret i32 %rem
+}
+
+attributes #1 = { inlinehint noinline }
+
+!llvm.module.flags = !{!0, !1, !2}
+
+!0 = !{i32 1, !"wchar_size", i32 4}
+!1 = !{i32 1, !"EnableSplitLTOUnit", i32 0}
+!2 = !{i32 1, !"ProfileSummary", !3}
+!3 = !{!4, !5, !6, !7, !8, !9, !10, !11}
+!4 = !{!"ProfileFormat", !"InstrProf"}
+!5 = !{!"TotalCount", i64 500002}
+!6 = !{!"MaxCount", i64 200000}
+!7 = !{!"MaxInternalCount", i64 100000}
+!8 = !{!"MaxFunctionCount", i64 200000}
+!9 = !{!"NumCounts", i64 6}
+!10 = !{!"NumFunctions", i64 4}
+!11 = !{!"DetailedSummary", !12}
+!12 = !{!13, !14, !15, !16, !17, !18, !19, !20, !21, !22, !23, !24, !25, !26, !27, !28}
+!13 = !{i32 10000, i64 200000, i32 1}
+!14 = !{i32 100000, i64 200000, i32 1}
+!15 = !{i32 200000, i64 200000, i32 1}
+!16 = !{i32 300000, i64 200000, i32 1}
+!17 = !{i32 400000, i64 200000, i32 1}
+!18 = !{i32 500000, i64 100000, i32 4}
+!19 = !{i32 600000, i64 100000, i32 4}
+!20 = !{i32 700000, i64 100000, i32 4}
+!21 = !{i32 800000, i64 100000, i32 4}
+!22 = !{i32 900000, i64 100000, i32 4}
+!23 = !{i32 950000, i64 100000, i32 4}
+!24 = !{i32 990000, i64 100000, i32 4}
+!25 = !{i32 999000, i64 100000, i32 4}
+!26 = !{i32 999900, i64 100000, i32 4}
+!27 = !{i32 999990, i64 100000, i32 4}
+!28 = !{i32 999999, i64 1, i32 6}
+!30 = !{!"function_entry_count", i64 200000}
+!31 = !{!"branch_weights", i32 100000, i32 100000}
+!32 = !{!33, !33, i64 0}
+!33 = !{!"int", !34, i64 0}
+!34 = !{!"omnipotent char", !35, i64 0}
+!35 = !{!"Simple C/C++ TBAA"}
+!36 = !{!"cspgo_bar.c:cond"}
diff --git a/llvm/test/Transforms/PGOProfile/counter_promo_sampling.ll b/llvm/test/Transforms/PGOProfile/counter_promo_sampling.ll
new file mode 100644
index 000000000000000..6f13196a724994e
--- /dev/null
+++ b/llvm/test/Transforms/PGOProfile/counter_promo_sampling.ll
@@ -0,0 +1,78 @@
+; RUN: opt < %s --passes=pgo-instr-gen,instrprof -do-counter-promotion=true -sampled-instr=true -skip-ret-exit-block=0 -S | FileCheck --check-prefixes=SAMPLING,PROMO %s
+
+; SAMPLING: $__llvm_profile_sampling = comdat any
+; SAMPLING: @__llvm_profile_sampling = thread_local global i16 0, comdat
+
+define void @foo(i32 %n, i32 %N) {
+; SAMPLING-LABEL: @foo
+; SAMPLING:  %[[VV0:[0-9]+]] = load i16, ptr @__llvm_profile_sampling, align 2
+; SAMPLING:  %[[VV1:[0-9]+]] = icmp ule i16 %[[VV0]], 200
+; SAMPLING:  br i1 %[[VV1]], label {{.*}}, label {{.*}}, !prof !0
+; SAMPLING: {{.*}} = load {{.*}} @__profc_foo{{.*}} 3)
+; SAMPLING-NEXT: add
+; SAMPLING-NEXT: store {{.*}}@__profc_foo{{.*}}3)
+bb:
+  %tmp = add nsw i32 %n, 1
+  %tmp1 = add nsw i32 %n, -1
+  br label %bb2
+
+bb2:
+; PROMO: phi {{.*}}
+; PROMO-NEXT: phi {{.*}}
+; PROMO-NEXT: phi {{.*}}
+; PROMO-NEXT: phi {{.*}}
+  %i.0 = phi i32 [ 0, %bb ], [ %tmp10, %bb9 ]
+  %tmp3 = icmp slt i32 %i.0, %tmp
+  br i1 %tmp3, label %bb4, label %bb5
+
+bb4:
+  tail call void @bar(i32 1)
+  br label %bb9
+
+bb5:
+  %tmp6 = icmp slt i32 %i.0, %tmp1
+  br i1 %tmp6, label %bb7, label %bb8
+
+bb7:
+  tail call void @bar(i32 2)
+  br label %bb9
+
+bb8:
+  tail call void @bar(i32 3)
+  br label %bb9
+
+bb9:
+; SAMPLING:       phi {{.*}}
+; SAMPLING-NEXT:  %[[V1:[0-9]+]] = add i16 {{.*}}, 1
+; SAMPLING-NEXT:  store i16 %[[V1]], ptr @__llvm_profile_sampling, align 2
+; SAMPLING:       phi {{.*}}
+; SAMPLING-NEXT:  %[[V2:[0-9]+]] = add i16 {{.*}}, 1
+; SAMPLING-NEXT:  store i16 %[[V2]], ptr @__llvm_profile_sampling, align 2
+; SAMPLING:       phi {{.*}}
+; SAMPLING-NEXT:  %[[V3:[0-9]+]] = add i16 {{.*}}, 1
+; SAMPLING-NEXT:  store i16 %[[V3]], ptr @__llvm_profile_sampling, align 2
+; PROMO: %[[LIVEOUT3:[a-z0-9]+]] = phi {{.*}}
+; PROMO-NEXT: %[[LIVEOUT2:[a-z0-9]+]] = phi {{.*}}
+; PROMO-NEXT: %[[LIVEOUT1:[a-z0-9]+]] = phi {{.*}}
+  %tmp10 = add nsw i32 %i.0, 1
+  %tmp11 = icmp slt i32 %tmp10, %N
+  br i1 %tmp11, label %bb2, label %bb12
+
+bb12:
+  ret void
+; PROMO: %[[CHECK1:[a-z0-9.]+]] = load {{.*}} @__profc_foo{{.*}}
+; PROMO-NEXT: add {{.*}} %[[CHECK1]], %[[LIVEOUT1]]
+; PROMO-NEXT: store {{.*}}@__profc_foo{{.*}}
+; PROMO-NEXT: %[[CHECK2:[a-z0-9.]+]] = load {{.*}} @__profc_foo{{.*}} 1)
+; PROMO-NEXT: add {{.*}} %[[CHECK2]], %[[LIVEOUT2]]
+; PROMO-NEXT: store {{.*}}@__profc_foo{{.*}}1)
+; PROMO-NEXT: %[[CHECK3:[a-z0-9.]+]] = load {{.*}} @__profc_foo{{.*}} 2)
+; PROMO-NEXT: add {{.*}} %[[CHECK3]], %[[LIVEOUT3]]
+; PROMO-NEXT: store {{.*}}@__profc_foo{{.*}}2)
+; PROMO-NOT: @__profc_foo{{.*}})
+
+}
+
+declare void @bar(i32)
+
+; SAMPLING: !0 = !{!"branch_weights", i32 200, i32 65336}
diff --git a/llvm/test/Transforms/PGOProfile/cspgo_sample.ll b/llvm/test/Transforms/PGOProfile/cspgo_sample.ll
new file mode 100644
index 000000000000000..6683cae4e64c10d
--- /dev/null
+++ b/llvm/test/Transforms/PGOProfile/cspgo_sample.ll
@@ -0,0 +1,112 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
+; REQUIRES: x86-registered-target
+
+; RUN: opt -module-summary %s -o %t1.bc
+; RUN: opt -module-summary %S/Inputs/cspgo_bar_sample.ll -o %t2.bc
+; RUN: llvm-lto2 run -lto-cspgo-profile-file=alloc -enable-sampled-instr -lto-cspgo-gen -save-temps -o %t %t1.bc %t2.bc \
+; RUN:   -r=%t1.bc,foo,pl \
+; RUN:   -r=%t1.bc,bar,l \
+; RUN:   -r=%t1.bc,main,plx \
+; RUN:   -r=%t1.bc,__llvm_profile_filename,plx \
+; RUN:   -r=%t1.bc,__llvm_profile_raw_version,plx \
+; RUN:   -r=%t1.bc,__llvm_profile_sampling,pl \
+; RUN:   -r=%t2.bc,bar,pl \
+; RUN:   -r=%t2.bc,odd,pl \
+; RUN:   -r=%t2.bc,even,pl \
+; RUN:   -r=%t2.bc,__llvm_profile_filename,x \
+; RUN:   -r=%t2.bc,__llvm_profile_raw_version,x \
+; RUN:   -r=%t2.bc,__llvm_profile_sampling,
+; RUN: llvm-dis %t.1.4.opt.bc -o - | FileCheck %s --check-prefix=CSGEN
+
+; CSGEN: @__llvm_profile_sampling = thread_local global i16 0, comdat
+; CSGEN: @__profc_
+; CSGEN: @__profd_
+
+source_filename = "cspgo.c"
+target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+$__llvm_profile_filename = comdat any
+$__llvm_profile_raw_version = comdat any
+$__llvm_profile_sampling = comdat any
+ at __llvm_profile_filename = local_unnamed_addr constant [25 x i8] c"pass2/default_%m.profraw\00", comdat
+ at __llvm_profile_raw_version = local_unnamed_addr constant i64 216172782113783812, comdat
+ at __llvm_profile_sampling = thread_local global i16 0, comdat
+ at llvm.used = appending global [1 x i8*] [i8* bitcast (i64* @__llvm_profile_sampling to i8*)], section "llvm.metadata"
+
+define dso_local void @foo() #0 !prof !30 {
+entry:
+  br label %for.body
+
+for.body:
+  %i.06 = phi i32 [ 0, %entry ], [ %add1, %for.body ]
+  tail call void @bar(i32 %i.06) #3
+  %add = or i32 %i.06, 1
+  tail call void @bar(i32 %add) #3
+  %add1 = add nuw nsw i32 %i.06, 2
+  %cmp = icmp ult i32 %add1, 200000
+  br i1 %cmp, label %for.body, label %for.end, !prof !31
+
+for.end:
+  ret void
+}
+
+; CSGEN-LABEL: @foo
+; CSGEN:        [[TMP0:%.*]]  = load i16, ptr @__llvm_profile_sampling, align 2
+; CSGEN-NEXT:   [[TMP1:%.*]] = icmp ult i16 [[TMP0]], 201
+; CSGEN-NEXT:   br i1 [[TMP1]], label %{{.*}}, label %{{.*}}, !prof [[PROF:![0-9]+]]
+; CSGEN:        [[TMP2:%.*]] = add i16 {{.*}}, 1
+; CSGEN-NEXT:   store i16 [[TMP2]], ptr @__llvm_profile_sampling, align 2
+
+declare dso_local void @bar(i32)
+
+define dso_local i32 @main() !prof !30 {
+entry:
+  tail call void @foo()
+  ret i32 0
+}
+; CSGEN-LABEL: @main
+; CSGEN:        [[TMP0:%.*]]  = load i16, ptr @__llvm_profile_sampling, align 2
+; CSGEN-NEXT:   [[TMP1:%.*]] = icmp ult i16 [[TMP0]], 201
+; CSGEN-NEXT:   br i1 [[TMP1]], label %{{.*}}, label %{{.*}}, !prof [[PROF:![0-9]+]]
+; CSGEN:        [[TMP2:%.*]] = add i16 {{.*}}, 1
+; CSGEN-NEXT:   store i16 [[TMP2]], ptr @__llvm_profile_sampling, align 2
+
+attributes #0 = { "target-cpu"="x86-64" }
+
+!llvm.module.flags = !{!0, !1, !2}
+
+!0 = !{i32 1, !"wchar_size", i32 4}
+!1 = !{i32 1, !"EnableSplitLTOUnit", i32 0}
+!2 = !{i32 1, !"ProfileSummary", !3}
+!3 = !{!4, !5, !6, !7, !8, !9, !10, !11}
+!4 = !{!"ProfileFormat", !"InstrProf"}
+!5 = !{!"TotalCount", i64 500002}
+!6 = !{!"MaxCount", i64 200000}
+!7 = !{!"MaxInternalCount", i64 100000}
+!8 = !{!"MaxFunctionCount", i64 200000}
+!9 = !{!"NumCounts", i64 6}
+!10 = !{!"NumFunctions", i64 4}
+!11 = !{!"DetailedSummary", !12}
+!12 = !{!13, !14, !15, !16, !17, !18, !19, !20, !21, !22, !23, !24, !25, !26, !27, !28}
+!13 = !{i32 10000, i64 200000, i32 1}
+!14 = !{i32 100000, i64 200000, i32 1}
+!15 = !{i32 200000, i64 200000, i32 1}
+!16 = !{i32 300000, i64 200000, i32 1}
+!17 = !{i32 400000, i64 200000, i32 1}
+!18 = !{i32 500000, i64 100000, i32 4}
+!19 = !{i32 600000, i64 100000, i32 4}
+!20 = !{i32 700000, i64 100000, i32 4}
+!21 = !{i32 800000, i64 100000, i32 4}
+!22 = !{i32 900000, i64 100000, i32 4}
+!23 = !{i32 950000, i64 100000, i32 4}
+!24 = !{i32 990000, i64 100000, i32 4}
+!25 = !{i32 999000, i64 100000, i32 4}
+!26 = !{i32 999900, i64 100000, i32 4}
+!27 = !{i32 999990, i64 100000, i32 4}
+!28 = !{i32 999999, i64 1, i32 6}
+!30 = !{!"function_entry_count", i64 1}
+!31 = !{!"branch_weights", i32 100000, i32 1}
+
+; CSGEN: [[PROF]] = !{!"branch_weights", i32 200, i32 65336}
+
diff --git a/llvm/test/Transforms/PGOProfile/instrprof_sample.ll b/llvm/test/Transforms/PGOProfile/instrprof_sample.ll
new file mode 100644
index 000000000000000..1b897b1bbc12d08
--- /dev/null
+++ b/llvm/test/Transforms/PGOProfile/instrprof_sample.ll
@@ -0,0 +1,47 @@
+; RUN: opt < %s -passes=instrprof -sampled-instr -S | FileCheck %s --check-prefixes=SAMPLE-VAR,SAMPLE-CODE,SAMPLE-DURATION
+; RRRRUN: opt < %s -passes=instrprof -sampled-instr -sampled-instr-duration=100 -S| FileCheck %s --check-prefixes=SAMPLE-VAR,SAMPLE-CODE,SAMPLE-DURATION-100
+
+target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+$__llvm_profile_raw_version = comdat any
+
+; SAMPLE-VAR: $__llvm_profile_sampling = comdat any
+
+ at __llvm_profile_raw_version = constant i64 72057594037927940, comdat
+ at __profn_f = private constant [1 x i8] c"f"
+
+; SAMPLE-VAR: @__llvm_profile_sampling = thread_local global i16 0, comdat
+; SAMPLE-VAR: @__profc_f = private global [1 x i64] zeroinitializer, section "__llvm_prf_cnts", comdat, align 8
+; SAMPLE-VAR: @__profd_f = private global { i64, i64, i64, ptr, ptr, i32, [2 x i16] } { i64 -3706093650706652785, i64 12884901887, i64 sub (i64 ptrtoint (ptr @__profc_f to i64), i64 ptrtoint (ptr @__profd_f to i64)), ptr @f.local, ptr null, i32 1, [2 x i16] zeroinitializer }, section "__llvm_prf_data", comdat($__profc_f), align 8
+; SAMPLE-VAR: @__llvm_prf_nm = private constant [11 x i8] c"\01\09x\DAK\03\00\00g\00g", section "__llvm_prf_names", align 1
+; SAMPLE-VAR: @llvm.compiler.used = appending global [2 x ptr] [ptr @__llvm_profile_sampling, ptr @__profd_f], section "llvm.metadata"
+; SAMPLE-VAR: @llvm.used = appending global [1 x ptr] [ptr @__llvm_prf_nm], section "llvm.metadata"
+
+
+define void @f() {
+; SAMPLE-CODE-LABEL: @f(
+; SAMPLE-CODE:  entry:
+; SAMPLE-CODE-NEXT:    [[TMP0:%.*]] = load i16, ptr @__llvm_profile_sampling, align 2
+; SAMPLE-DURATION:         [[TMP1:%.*]] = icmp ule i16 [[TMP0]], 200
+; SAMPLE-DURATION-100:     [[TMP1:%.*]] = icmp ule i16 [[TMP0]], 100
+; SAMPLE-CODE:         br i1 [[TMP1]], label %[[TMP2:.*]], label %[[TMP4:.*]], !prof !0
+; SAMPLE-CODE:       [[TMP2]]:
+; SAMPLE-CODE-NEXT:    [[PGOCOUNT:%.*]] = load i64, ptr @__profc_f
+; SAMPLE-CODE-NEXT:    [[TMP3:%.*]] = add i64 [[PGOCOUNT]], 1
+; SAMPLE-CODE-NEXT:    store i64 [[TMP3]], ptr @__profc_f
+; SAMPLE-CODE-NEXT:    br label %[[TMP4]]
+; SAMPLE-CODE:       [[TMP4]]:
+; SAMPLE-CODE-NEXT:    [[TMP5:%.*]] = add i16 [[TMP0]], 1
+; SAMPLE-CODE-NEXT:    store i16 [[TMP5]],  ptr @__llvm_profile_sampling, align 2
+; SAMPLE-CODE-NEXT:    ret void
+;
+entry:
+  call void @llvm.instrprof.increment(i8* getelementptr inbounds ([1 x i8], [1 x i8]* @__profn_f, i32 0, i32 0), i64 12884901887, i32 1, i32 0)
+  ret void
+}
+
+; SAMPLE-DURATION: !0 = !{!"branch_weights", i32 200, i32 65336}
+; SAMPLE-DURATION-100: !0 = !{!"branch_weights", i32 100, i32 65436}
+
+declare void @llvm.instrprof.increment(i8*, i64, i32, i32)

>From d4a8145a47e1e48a5f37bfcf39e250dabf16d4d5 Mon Sep 17 00:00:00 2001
From: Rong Xu <xur at google.com>
Date: Fri, 1 Dec 2023 15:44:37 -0800
Subject: [PATCH 2/2] [ThinLTO] Add module names to ThinLTO final objects

Emit the module name for final ThinLTO objects under
--lto-output-module-name. This will add the module name as
part of the native objects or asm files for
-Wl,--lto-emit-asm, -Wl,--lto-obj-path=<path>,
and -Wl,--save-temps=prelink

Also fix the bug where these options won't work with thinlto-cache
is used.
---
 lld/ELF/Config.h                             |  1 +
 lld/ELF/Driver.cpp                           |  2 +
 lld/ELF/LTO.cpp                              | 68 ++++++++++++++++----
 lld/ELF/Options.td                           |  3 +
 lld/test/ELF/lto/thinlto_finallink_output.ll | 59 +++++++++++++++++
 llvm/include/llvm/LTO/LTO.h                  | 13 +++-
 6 files changed, 132 insertions(+), 14 deletions(-)
 create mode 100644 lld/test/ELF/lto/thinlto_finallink_output.ll

diff --git a/lld/ELF/Config.h b/lld/ELF/Config.h
index 56229334f9a44ae..afb8a692457d783 100644
--- a/lld/ELF/Config.h
+++ b/lld/ELF/Config.h
@@ -323,6 +323,7 @@ struct Config {
   bool zText;
   bool zRetpolineplt;
   bool zWxneeded;
+  bool ltoOutputModuleName;
   DiscardPolicy discard;
   GnuStackKind zGnustack;
   ICFLevel icf;
diff --git a/lld/ELF/Driver.cpp b/lld/ELF/Driver.cpp
index 5f88389a5840824..460e67eae888d9e 100644
--- a/lld/ELF/Driver.cpp
+++ b/lld/ELF/Driver.cpp
@@ -1317,6 +1317,8 @@ static void readConfigs(opt::InputArgList &args) {
   else
     error("invalid codegen optimization level for LTO: " + Twine(ltoCgo));
   config->ltoObjPath = args.getLastArgValue(OPT_lto_obj_path_eq);
+  config->ltoOutputModuleName = args.hasFlag(
+      OPT_lto_output_module_name, OPT_no_lto_output_module_name, false);
   config->ltoPartitions = args::getInteger(args, OPT_lto_partitions, 1);
   config->ltoSampleProfile = args.getLastArgValue(OPT_lto_sample_profile);
   config->ltoBasicBlockSections =
diff --git a/lld/ELF/LTO.cpp b/lld/ELF/LTO.cpp
index 504c12aac6c5696..8dc5d814a04281f 100644
--- a/lld/ELF/LTO.cpp
+++ b/lld/ELF/LTO.cpp
@@ -351,23 +351,65 @@ std::vector<InputFile *> BitcodeCompiler::compile() {
   if (!config->thinLTOCacheDir.empty())
     pruneCache(config->thinLTOCacheDir, config->thinLTOCachePolicy, files);
 
-  if (!config->ltoObjPath.empty()) {
-    saveBuffer(buf[0], config->ltoObjPath);
-    for (unsigned i = 1; i != maxTasks; ++i)
-      saveBuffer(buf[i], config->ltoObjPath + Twine(i));
-  }
+  auto doSaveBuffer = [&](const StringRef Arg, const StringRef Suffix = "") {
+    // There are a few cases:
+    // (1) path/test.o (using current directory)
+    // (2) /tmp/test-a7a1e4.o (using tmp directory)
+    // (3) if the input obj is in a archive. the module name is like
+    // "arch/x86/built-in.a(procfs.o at 11368)"
+    //
+    // This function replaces '/' and '(' with '-' and terminates at the
+    // last '.'.  it returns the following for the above cases, respectively,
+    // (1) path_test
+    // (2) tmp_test-a7a1e4 (remove the first /).
+    // (3) arch_x86_build-in.a_procfs
+    //
+    auto getFileNameString = [](const StringRef Str) {
+      if (Str.empty())
+        return std::string();
+      size_t End = Str.find_last_of(".");
+      size_t Begin = 0;
+      if (Str[0] == '/' || Str[0] == '\\')
+        Begin = 1;
+      std::string Ret = Str.substr(Begin, End - Begin).str();
+      auto position = std::string::npos;
+      while ((position = Ret.find_first_of("/\\(")) != std::string::npos) {
+        Ret.replace(position, 1, 1, '_');
+      }
+      return Ret;
+    };
+
+    auto saveBufferOrFile = [](const StringRef &Buf, const MemoryBuffer *File,
+                               const Twine &Path) {
+      if (Buf.empty() && File)
+        return saveBuffer(File->getBuffer(), Path);
+      saveBuffer(Buf, Path);
+    };
 
-  if (config->saveTempsArgs.contains("prelink")) {
     if (!buf[0].empty())
-      saveBuffer(buf[0], config->outputFile + ".lto.o");
-    for (unsigned i = 1; i != maxTasks; ++i)
-      saveBuffer(buf[i], config->outputFile + Twine(i) + ".lto.o");
-  }
+      saveBufferOrFile(buf[0], files[0].get(), Arg + Suffix);
+    for (unsigned i = 1; i != maxTasks; ++i) {
+      if (!config->ltoOutputModuleName) {
+        saveBufferOrFile(buf[i], files[i].get(), Arg + Twine(i) + Suffix);
+      } else {
+        const std::string Name =
+            getFileNameString(ltoObj->getModuleName(i - 1));
+        saveBufferOrFile(buf[i], files[i].get(),
+                         Arg + "_" + Twine(i) + "_" + Name + Suffix);
+      }
+    }
+  };
+
+  if (!config->ltoObjPath.empty())
+    doSaveBuffer(config->ltoObjPath,
+                 config->ltoOutputModuleName ? ".lto.o" : "");
+
+  if (config->saveTempsArgs.contains("prelink"))
+    doSaveBuffer(config->outputFile, ".lto.o");
 
   if (config->ltoEmitAsm) {
-    saveBuffer(buf[0], config->outputFile);
-    for (unsigned i = 1; i != maxTasks; ++i)
-      saveBuffer(buf[i], config->outputFile + Twine(i));
+    doSaveBuffer(config->outputFile,
+                 config->ltoOutputModuleName ? ".lto.s" : "");
     return {};
   }
 
diff --git a/lld/ELF/Options.td b/lld/ELF/Options.td
index 9a23f48350644a0..c2fa9237f9561f8 100644
--- a/lld/ELF/Options.td
+++ b/lld/ELF/Options.td
@@ -610,6 +610,9 @@ defm lto_pgo_warn_mismatch: BB<"lto-pgo-warn-mismatch",
 defm lto_known_safe_vtables : EEq<"lto-known-safe-vtables",
   "When --lto-validate-all-vtables-have-type-infos is enabled, skip validation on these vtables (_ZTV symbols)">;
 def lto_obj_path_eq: JJ<"lto-obj-path=">;
+defm lto_output_module_name: BB<"lto-output-module-name",
+  "In ThinLTO, using the module name in the final native objects or asm files",
+  "Do not include the module names in final objects and asm files">;
 def lto_sample_profile: JJ<"lto-sample-profile=">,
   HelpText<"Sample profile file path">;
 defm lto_validate_all_vtables_have_type_infos: BB<"lto-validate-all-vtables-have-type-infos",
diff --git a/lld/test/ELF/lto/thinlto_finallink_output.ll b/lld/test/ELF/lto/thinlto_finallink_output.ll
new file mode 100644
index 000000000000000..08c9aea7ae92cd0
--- /dev/null
+++ b/lld/test/ELF/lto/thinlto_finallink_output.ll
@@ -0,0 +1,59 @@
+; REQUIRES: x86
+;
+; RUN: cd %T
+; RUN: opt -module-summary %s -o obj1.o
+; RUN: opt -module-summary %p/Inputs/thinlto.ll -o obj2.o
+; RUN: opt -module-summary %s -o %t_obj3.o
+; RUN: opt -module-summary %p/Inputs/thinlto.ll -o %t_obj4.o
+;
+; Objects with a relative path.
+; RUN: rm -f *.lto.o *.s
+; RUN: ld.lld --lto-output-module-name --save-temps=prelink --lto-obj-path=aaa --lto-emit-asm --thinlto-jobs=1 --entry=f obj1.o obj2.o -o bin1
+; RUN: ls -1 *.lto.o *.s | FileCheck %s --check-prefixes=OBJPATHOUT1,PRELINKOUT1
+; With thinlto-jobs=all.
+; RUN: rm -f *.lto.o *.s
+; RUN: ld.lld --lto-output-module-name --save-temps=prelink --lto-obj-path=aaa --lto-emit-asm --thinlto-jobs=all --entry=f obj1.o obj2.o -o bin1
+; RUN: ls -1 *.lto.o *.s | FileCheck %s --check-prefixes=OBJPATHOUT1,PRELINKOUT1
+; Objects with an absolute path.
+; RUN: rm -f *.lto.o *.s
+; RUN: ld.lld --lto-output-module-name --save-temps=prelink --lto-obj-path=aaa --lto-emit-asm --thinlto-jobs=1 --entry=f %t_obj3.o %t_obj4.o -o bin2
+; RUN: ls -1 *.lto.o *.s | FileCheck %s --check-prefixes=OBJPATHOUT2,PRELINKOUT2
+; Objects in an archive
+; RUN: rm -f *.lto.o *.s
+; RUN: llvm-ar rcS ar.a obj1.o obj2.o
+; RUN: ld.lld --lto-output-module-name --save-temps=prelink --lto-obj-path=aaa --lto-emit-asm --thinlto-jobs=1 --entry=f ar.a -o bin1
+; RUN: ls -1 *.lto.o *.s | FileCheck %s --check-prefixes=OBJPATHOUT3,PRELINKOUT3
+; Use with thinlto-cahce
+; RUN: rm -f *.lto.o *.s
+; RUN: ld.lld --lto-output-module-name --save-temps=prelink --thinlto-cache-dir=thinlto-cache --lto-emit-asm --lto-obj-path=aaa --thinlto-jobs=1 --entry=f obj1.o obj2.o -o bin1
+; RUN: ls -1 *.lto.o *.s | FileCheck %s --check-prefixes=OBJPATHOUT1,PRELINKOUT1
+;
+; OBJPATHOUT1-DAG: aaa_1_obj1.lto.o
+; OBJPATHOUT1-DAG: aaa_2_obj2.lto.o
+; PRELINKOUT1-DAG: bin1_1_obj1.lto.o
+; PRELINKOUT1-DAG: bin1_2_obj2.lto.o
+; PRELINKOUT1-DAG: bin1_1_obj1.lto.s
+; PRELINKOUT1-DAG: bin1_2_obj2.lto.s
+; OBJPATHOUT2-DAG: aaa_1_{{.*}}_obj3.lto.o
+; OBJPATHOUT2-DAG: aaa_2_{{.*}}obj4.lto.o
+; PRELINKOUT2-DAG: bin2_1_{{.*}}obj3.lto.o
+; PRELINKOUT2-DAG: bin2_2_{{.*}}obj4.lto.o
+; PRELINKOUT2-DAG: bin2_1_{{.*}}obj3.lto.s
+; PRELINKOUT2-DAG: bin2_2_{{.*}}obj4.lto.s
+; OBJPATHOUT3-DAG: aaa_1_ar.a_obj1.lto.o
+; OBJPATHOUT3-DAG: aaa_2_ar.a_obj2.lto.o
+; PRELINKOUT3-DAG: bin1_1_ar.a_obj1.lto.o
+; PRELINKOUT3-DAG: bin1_2_ar.a_obj2.lto.o
+; PRELINKOUT3-DAG: bin1_1_ar.a_obj1.lto.s
+; PRELINKOUT3-DAG: bin1_2_ar.a_obj2.lto.s
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+declare void @g(...)
+
+define void @f() {
+entry:
+  call void (...) @g()
+  ret void
+}
diff --git a/llvm/include/llvm/LTO/LTO.h b/llvm/include/llvm/LTO/LTO.h
index be85c40983475ff..4cb3a512f88bd23 100644
--- a/llvm/include/llvm/LTO/LTO.h
+++ b/llvm/include/llvm/LTO/LTO.h
@@ -299,7 +299,18 @@ class LTO {
 
   /// Static method that returns a list of libcall symbols that can be generated
   /// by LTO but might not be visible from bitcode symbol table.
-  static ArrayRef<const char*> getRuntimeLibcallSymbols();
+  static ArrayRef<const char *> getRuntimeLibcallSymbols();
+
+  /// Return the name of n-th module. This only applies to ThinLTO.
+  StringRef getModuleName(size_t N) const {
+    size_t I = N;
+    if (I >= ThinLTO.ModuleMap.size())
+      return "";
+    auto it = ThinLTO.ModuleMap.begin();
+    while (I--)
+      it++;
+    return (*it).first;
+  }
 
 private:
   Config Conf;



More information about the llvm-commits mailing list