[llvm] [lld] [compiler-rt] [InstrProf] No linkage prefixes in IRPGO names (PR #76994)

Ellis Hoag via llvm-commits llvm-commits at lists.llvm.org
Thu Jan 4 15:23:35 PST 2024


https://github.com/ellishg updated https://github.com/llvm/llvm-project/pull/76994

>From 07a7ded4b8ee84276d509f32141bbe64e1172256 Mon Sep 17 00:00:00 2001
From: Ellis Hoag <ellis.sparky.hoag at gmail.com>
Date: Wed, 3 Jan 2024 15:38:43 -0800
Subject: [PATCH 1/3] [InstrProf] No linkage prefixes for IRPGO names

Change the format of IRPGO counter names to `[<filepath>;]<mangled-name>` which is computed by `GlobalValue::getGlobalIdentifier()` to fix #74565.

In fe051934cbb0aaf25d960d7d45305135635d650b (https://reviews.llvm.org/D156569) the format of IRPGO counter names was changed to be `[<filepath>;]<linkage-name>` where `<linkage-name>` is `F.getName()` with some prefix, e.g., `_` or `l_` on Mach-O (it is confusing that `<linkage-name>` is computed with `Mangler().getNameWithPrefix()` while `<mangled-name>` is just `F.getName()`). We discovered in #74565 that this causes some missed import issues on some targets and #74008 is a partial fix.

Since `<mangled-name>` may not match the `<linkage-name>` on some targets like Mach-O, we will need to post-process the output of `llvm-profdata order` before passing to the linker via `-order_file`.

Profiles generated after fe051934cbb0aaf25d960d7d45305135635d650b will become stale after this diff, but I think this is acceptable since that patch after the LLVM 18 cut which has not been released yet.
---
 lld/test/MachO/pgo-warn-mismatch.ll          |  2 +-
 llvm/lib/ProfileData/InstrProf.cpp           | 52 +++++++-------------
 llvm/tools/llvm-profdata/llvm-profdata.cpp   |  6 ++-
 llvm/unittests/ProfileData/InstrProfTest.cpp | 18 +------
 4 files changed, 26 insertions(+), 52 deletions(-)

diff --git a/lld/test/MachO/pgo-warn-mismatch.ll b/lld/test/MachO/pgo-warn-mismatch.ll
index 632adffb01fb3e..365eeaccb4b403 100644
--- a/lld/test/MachO/pgo-warn-mismatch.ll
+++ b/lld/test/MachO/pgo-warn-mismatch.ll
@@ -24,7 +24,7 @@ entry:
 
 ;--- cs.proftext
 :csir
-_foo
+foo
 # Func Hash:
 2277602155505015273
 # Num Counters:
diff --git a/llvm/lib/ProfileData/InstrProf.cpp b/llvm/lib/ProfileData/InstrProf.cpp
index 134a400e639c4b..08126cff7d06be 100644
--- a/llvm/lib/ProfileData/InstrProf.cpp
+++ b/llvm/lib/ProfileData/InstrProf.cpp
@@ -14,7 +14,6 @@
 #include "llvm/ProfileData/InstrProf.h"
 #include "llvm/ADT/ArrayRef.h"
 #include "llvm/ADT/SetVector.h"
-#include "llvm/ADT/SmallString.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/StringExtras.h"
 #include "llvm/ADT/StringRef.h"
@@ -27,7 +26,6 @@
 #include "llvm/IR/Instruction.h"
 #include "llvm/IR/LLVMContext.h"
 #include "llvm/IR/MDBuilder.h"
-#include "llvm/IR/Mangler.h"
 #include "llvm/IR/Metadata.h"
 #include "llvm/IR/Module.h"
 #include "llvm/IR/Type.h"
@@ -297,31 +295,6 @@ static StringRef getStrippedSourceFileName(const GlobalObject &GO) {
   return FileName;
 }
 
-// The PGO name has the format [<filepath>;]<linkage-name> where <filepath>; is
-// provided if linkage is local and <linkage-name> is the mangled function
-// name. The filepath is used to discriminate possibly identical function names.
-// ; is used because it is unlikely to be found in either <filepath> or
-// <linkage-name>.
-//
-// Older compilers used getPGOFuncName() which has the format
-// [<filepath>:]<function-name>. <filepath> is used to discriminate between
-// possibly identical function names when linkage is local and <function-name>
-// simply comes from F.getName(). This caused trouble for Objective-C functions
-// which commonly have :'s in their names. Also, since <function-name> is not
-// mangled, they cannot be passed to Mach-O linkers via -order_file. We still
-// need to compute this name to lookup functions from profiles built by older
-// compilers.
-static std::string
-getIRPGONameForGlobalObject(const GlobalObject &GO,
-                            GlobalValue::LinkageTypes Linkage,
-                            StringRef FileName) {
-  SmallString<64> Name;
-  // FIXME: Mangler's handling is kept outside of `getGlobalIdentifier` for now.
-  // For more details please check issue #74565.
-  Mangler().getNameWithPrefix(Name, &GO, /*CannotUsePrivateLabel=*/true);
-  return GlobalValue::getGlobalIdentifier(Name, Linkage, FileName);
-}
-
 static std::optional<std::string> lookupPGONameFromMetadata(MDNode *MD) {
   if (MD != nullptr) {
     StringRef S = cast<MDString>(MD->getOperand(0))->getString();
@@ -330,7 +303,17 @@ static std::optional<std::string> lookupPGONameFromMetadata(MDNode *MD) {
   return {};
 }
 
-// Returns the PGO object name. This function has some special handling
+// The PGO name has the format [<filepath>;]<function-name> where <filepath>; is
+// provided if linkage is local and is used to discriminate possibly identical
+// function names. ";" is used because it is unlikely to be found in either
+// <filepath> or <linkage-name>.
+//
+// Older compilers used getPGOFuncName() which has the format
+// [<filepath>:]<function-name>. This caused trouble for Objective-C functions
+// which commonly have :'s in their names. We still need to compute this name to
+// lookup functions from profiles built by older compilers.
+//
+// This function has some special handling
 // when called in LTO optimization. The following only applies when calling in
 // LTO passes (when \c InLTO is true): LTO's internalization privatizes many
 // global linkage symbols. This happens after value profile annotation, but
@@ -347,7 +330,8 @@ static std::string getIRPGOObjectName(const GlobalObject &GO, bool InLTO,
                                       MDNode *PGONameMetadata) {
   if (!InLTO) {
     auto FileName = getStrippedSourceFileName(GO);
-    return getIRPGONameForGlobalObject(GO, GO.getLinkage(), FileName);
+    return GlobalValue::getGlobalIdentifier(GO.getName(), GO.getLinkage(),
+                                            FileName);
   }
 
   // In LTO mode (when InLTO is true), first check if there is a meta data.
@@ -357,7 +341,8 @@ static std::string getIRPGOObjectName(const GlobalObject &GO, bool InLTO,
   // If there is no meta data, the function must be a global before the value
   // profile annotation pass. Its current linkage may be internal if it is
   // internalized in LTO mode.
-  return getIRPGONameForGlobalObject(GO, GlobalValue::ExternalLinkage, "");
+  return GlobalValue::getGlobalIdentifier(GO.getName(),
+                                          GlobalValue::ExternalLinkage, "");
 }
 
 // Returns the IRPGO function name and does special handling when called
@@ -371,8 +356,8 @@ std::string getIRPGOFuncName(const Function &F, bool InLTO) {
 // The implementation is kept for profile matching from older profiles.
 // This is similar to `getIRPGOFuncName` except that this function calls
 // 'getPGOFuncName' to get a name and `getIRPGOFuncName` calls
-// 'getIRPGONameForGlobalObject'. See the difference between two callees in the
-// comments of `getIRPGONameForGlobalObject`.
+// 'getIRPGOObjectName'. See the difference between two callees in the
+// comments of `getIRPGOObjectName`.
 std::string getPGOFuncName(const Function &F, bool InLTO, uint64_t Version) {
   if (!InLTO) {
     auto FileName = getStrippedSourceFileName(F);
@@ -401,8 +386,7 @@ getParsedIRPGOFuncName(StringRef IRPGOFuncName) {
 StringRef getFuncNameWithoutPrefix(StringRef PGOFuncName, StringRef FileName) {
   if (FileName.empty())
     return PGOFuncName;
-  // Drop the file name including ':' or ';'. See getIRPGONameForGlobalObject as
-  // well.
+  // Drop the file name including ':' or ';'. See getIRPGOObjectName as well.
   if (PGOFuncName.starts_with(FileName))
     PGOFuncName = PGOFuncName.drop_front(FileName.size() + 1);
   return PGOFuncName;
diff --git a/llvm/tools/llvm-profdata/llvm-profdata.cpp b/llvm/tools/llvm-profdata/llvm-profdata.cpp
index 12b81d411cfa91..0a611d199026d8 100644
--- a/llvm/tools/llvm-profdata/llvm-profdata.cpp
+++ b/llvm/tools/llvm-profdata/llvm-profdata.cpp
@@ -3157,7 +3157,11 @@ static int order_main(int argc, const char *argv[]) {
   BalancedPartitioning BP(Config);
   BP.run(Nodes);
 
-  WithColor::note() << "# Ordered " << Nodes.size() << " functions\n";
+  OS << "# Ordered " << Nodes.size() << " functions\n";
+  OS << "# Warning: Mach-O prefixes symbols with \"_\" or \"l_\" depending on "
+        "the linkage and this output does not take this into account. Some "
+        "post processing may be required before passing to the linker via "
+        "-order_file.\n";
   for (auto &N : Nodes) {
     auto [Filename, ParsedFuncName] =
         getParsedIRPGOFuncName(Reader->getSymtab().getFuncOrVarName(N.Id));
diff --git a/llvm/unittests/ProfileData/InstrProfTest.cpp b/llvm/unittests/ProfileData/InstrProfTest.cpp
index 6a71a975fbb12d..8ffb68de7a2d20 100644
--- a/llvm/unittests/ProfileData/InstrProfTest.cpp
+++ b/llvm/unittests/ProfileData/InstrProfTest.cpp
@@ -542,22 +542,14 @@ TEST_F(InstrProfTest, test_memprof_merge) {
 TEST_F(InstrProfTest, test_irpgo_function_name) {
   LLVMContext Ctx;
   auto M = std::make_unique<Module>("MyModule.cpp", Ctx);
-  // Use Mach-O mangling so that non-private symbols get a `_` prefix.
-  M->setDataLayout(DataLayout("m:o"));
   auto *FTy = FunctionType::get(Type::getVoidTy(Ctx), /*isVarArg=*/false);
 
   std::vector<std::tuple<StringRef, Function::LinkageTypes, StringRef>> Data;
-  Data.emplace_back("ExternalFoo", Function::ExternalLinkage, "_ExternalFoo");
+  Data.emplace_back("ExternalFoo", Function::ExternalLinkage, "ExternalFoo");
   Data.emplace_back("InternalFoo", Function::InternalLinkage,
-                    "MyModule.cpp;_InternalFoo");
-  Data.emplace_back("PrivateFoo", Function::PrivateLinkage,
-                    "MyModule.cpp;l_PrivateFoo");
-  Data.emplace_back("WeakODRFoo", Function::WeakODRLinkage, "_WeakODRFoo");
-  // Test Objective-C symbols
+                    "MyModule.cpp;InternalFoo");
   Data.emplace_back("\01-[C dynamicFoo:]", Function::ExternalLinkage,
                     "-[C dynamicFoo:]");
-  Data.emplace_back("-<C directFoo:>", Function::ExternalLinkage,
-                    "_-<C directFoo:>");
   Data.emplace_back("\01-[C internalFoo:]", Function::InternalLinkage,
                     "MyModule.cpp;-[C internalFoo:]");
 
@@ -590,10 +582,6 @@ TEST_F(InstrProfTest, test_pgo_function_name) {
   Data.emplace_back("ExternalFoo", Function::ExternalLinkage, "ExternalFoo");
   Data.emplace_back("InternalFoo", Function::InternalLinkage,
                     "MyModule.cpp:InternalFoo");
-  Data.emplace_back("PrivateFoo", Function::PrivateLinkage,
-                    "MyModule.cpp:PrivateFoo");
-  Data.emplace_back("WeakODRFoo", Function::WeakODRLinkage, "WeakODRFoo");
-  // Test Objective-C symbols
   Data.emplace_back("\01-[C externalFoo:]", Function::ExternalLinkage,
                     "-[C externalFoo:]");
   Data.emplace_back("\01-[C internalFoo:]", Function::InternalLinkage,
@@ -611,8 +599,6 @@ TEST_F(InstrProfTest, test_pgo_function_name) {
 TEST_F(InstrProfTest, test_irpgo_read_deprecated_names) {
   LLVMContext Ctx;
   auto M = std::make_unique<Module>("MyModule.cpp", Ctx);
-  // Use Mach-O mangling so that non-private symbols get a `_` prefix.
-  M->setDataLayout(DataLayout("m:o"));
   auto *FTy = FunctionType::get(Type::getVoidTy(Ctx), /*isVarArg=*/false);
   auto *InternalFooF =
       Function::Create(FTy, Function::InternalLinkage, "InternalFoo", M.get());

>From 9667aab8fdadfe1ab453c0952385243ab7d38ad2 Mon Sep 17 00:00:00 2001
From: Ellis Hoag <ellis.sparky.hoag at gmail.com>
Date: Thu, 4 Jan 2024 14:27:33 -0800
Subject: [PATCH 2/3] fixup! [InstrProf] No linkage prefixes for IRPGO names

---
 ...trprof-thinlto-indirect-call-promotion.cpp | 14 -------
 llvm/lib/ProfileData/InstrProf.cpp            | 41 +++++++++++--------
 llvm/tools/llvm-profdata/llvm-profdata.cpp    |  6 +--
 3 files changed, 26 insertions(+), 35 deletions(-)

diff --git a/compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp b/compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp
index 860c054f69e1a6..d361186cf26ac5 100644
--- a/compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp
+++ b/compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp
@@ -28,20 +28,6 @@
 // LTO if default linker is GNU ld or gold anyway.
 // REQUIRES: lld-available
 
-// Test should fail where linkage-name and mangled-name diverges, see issue https://github.com/llvm/llvm-project/issues/74565).
-// Currently, this name divergence happens on Mach-O object file format, or on
-// many (but not all) 32-bit Windows systems.
-//
-// XFAIL: system-darwin
-//
-// Mark 32-bit Windows as UNSUPPORTED for now as opposed to XFAIL. This test
-// should fail on many (but not all) 32-bit Windows systems and succeed on the
-// rest. The flexibility in triple string parsing makes it tricky to capture
-// both sets accurately. i[3-9]86 specifies arch as Triple::ArchType::x86, (win32|windows)
-// specifies OS as Triple::OS::Win32
-//
-// UNSUPPORTED: target={{i.86.*windows.*}}
-
 // RUN: rm -rf %t && split-file %s %t && cd %t
 
 // Do setup work for all below tests.
diff --git a/llvm/lib/ProfileData/InstrProf.cpp b/llvm/lib/ProfileData/InstrProf.cpp
index 08126cff7d06be..9c2a8537403cc9 100644
--- a/llvm/lib/ProfileData/InstrProf.cpp
+++ b/llvm/lib/ProfileData/InstrProf.cpp
@@ -295,25 +295,31 @@ static StringRef getStrippedSourceFileName(const GlobalObject &GO) {
   return FileName;
 }
 
-static std::optional<std::string> lookupPGONameFromMetadata(MDNode *MD) {
-  if (MD != nullptr) {
-    StringRef S = cast<MDString>(MD->getOperand(0))->getString();
-    return S.str();
-  }
-  return {};
-}
-
 // The PGO name has the format [<filepath>;]<function-name> where <filepath>; is
 // provided if linkage is local and is used to discriminate possibly identical
 // function names. ";" is used because it is unlikely to be found in either
-// <filepath> or <linkage-name>.
+// <filepath> or <function-name>.
 //
 // Older compilers used getPGOFuncName() which has the format
 // [<filepath>:]<function-name>. This caused trouble for Objective-C functions
 // which commonly have :'s in their names. We still need to compute this name to
 // lookup functions from profiles built by older compilers.
-//
-// This function has some special handling
+static std::string
+getIRPGONameForGlobalObject(const GlobalObject &GO,
+                            GlobalValue::LinkageTypes Linkage,
+                            StringRef FileName) {
+  return GlobalValue::getGlobalIdentifier(GO.getName(), Linkage, FileName);
+}
+
+static std::optional<std::string> lookupPGONameFromMetadata(MDNode *MD) {
+  if (MD != nullptr) {
+    StringRef S = cast<MDString>(MD->getOperand(0))->getString();
+    return S.str();
+  }
+  return {};
+}
+
+// Returns the PGO object name. This function has some special handling
 // when called in LTO optimization. The following only applies when calling in
 // LTO passes (when \c InLTO is true): LTO's internalization privatizes many
 // global linkage symbols. This happens after value profile annotation, but
@@ -330,8 +336,7 @@ static std::string getIRPGOObjectName(const GlobalObject &GO, bool InLTO,
                                       MDNode *PGONameMetadata) {
   if (!InLTO) {
     auto FileName = getStrippedSourceFileName(GO);
-    return GlobalValue::getGlobalIdentifier(GO.getName(), GO.getLinkage(),
-                                            FileName);
+    return getIRPGONameForGlobalObject(GO, GO.getLinkage(), FileName);
   }
 
   // In LTO mode (when InLTO is true), first check if there is a meta data.
@@ -341,8 +346,7 @@ static std::string getIRPGOObjectName(const GlobalObject &GO, bool InLTO,
   // If there is no meta data, the function must be a global before the value
   // profile annotation pass. Its current linkage may be internal if it is
   // internalized in LTO mode.
-  return GlobalValue::getGlobalIdentifier(GO.getName(),
-                                          GlobalValue::ExternalLinkage, "");
+  return getIRPGONameForGlobalObject(GO, GlobalValue::ExternalLinkage, "");
 }
 
 // Returns the IRPGO function name and does special handling when called
@@ -356,8 +360,8 @@ std::string getIRPGOFuncName(const Function &F, bool InLTO) {
 // The implementation is kept for profile matching from older profiles.
 // This is similar to `getIRPGOFuncName` except that this function calls
 // 'getPGOFuncName' to get a name and `getIRPGOFuncName` calls
-// 'getIRPGOObjectName'. See the difference between two callees in the
-// comments of `getIRPGOObjectName`.
+// 'getIRPGONameForGlobalObject'. See the difference between two callees in the
+// comments of `getIRPGONameForGlobalObject`.
 std::string getPGOFuncName(const Function &F, bool InLTO, uint64_t Version) {
   if (!InLTO) {
     auto FileName = getStrippedSourceFileName(F);
@@ -386,7 +390,8 @@ getParsedIRPGOFuncName(StringRef IRPGOFuncName) {
 StringRef getFuncNameWithoutPrefix(StringRef PGOFuncName, StringRef FileName) {
   if (FileName.empty())
     return PGOFuncName;
-  // Drop the file name including ':' or ';'. See getIRPGOObjectName as well.
+  // Drop the file name including ':' or ';'. See getIRPGONameForGlobalObject as
+  // well.
   if (PGOFuncName.starts_with(FileName))
     PGOFuncName = PGOFuncName.drop_front(FileName.size() + 1);
   return PGOFuncName;
diff --git a/llvm/tools/llvm-profdata/llvm-profdata.cpp b/llvm/tools/llvm-profdata/llvm-profdata.cpp
index 0a611d199026d8..87c2e406b6fb34 100644
--- a/llvm/tools/llvm-profdata/llvm-profdata.cpp
+++ b/llvm/tools/llvm-profdata/llvm-profdata.cpp
@@ -3158,9 +3158,9 @@ static int order_main(int argc, const char *argv[]) {
   BP.run(Nodes);
 
   OS << "# Ordered " << Nodes.size() << " functions\n";
-  OS << "# Warning: Mach-O prefixes symbols with \"_\" or \"l_\" depending on "
-        "the linkage and this output does not take this into account. Some "
-        "post processing may be required before passing to the linker via "
+  OS << "# Warning: Mach-O may prefix symbols with \"_\" depending on the "
+        "linkage and this output does not take that into account. Some "
+        "post-processing may be required before passing to the linker via "
         "-order_file.\n";
   for (auto &N : Nodes) {
     auto [Filename, ParsedFuncName] =

>From 474f5996c0c35cab0ab6f8ad309d98c7623a7b51 Mon Sep 17 00:00:00 2001
From: Ellis Hoag <ellis.sparky.hoag at gmail.com>
Date: Thu, 4 Jan 2024 15:23:09 -0800
Subject: [PATCH 3/3] fixup! [InstrProf] No linkage prefixes for IRPGO names

---
 llvm/lib/ProfileData/InstrProf.cpp | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/llvm/lib/ProfileData/InstrProf.cpp b/llvm/lib/ProfileData/InstrProf.cpp
index 9c2a8537403cc9..4264da8ad75143 100644
--- a/llvm/lib/ProfileData/InstrProf.cpp
+++ b/llvm/lib/ProfileData/InstrProf.cpp
@@ -295,13 +295,13 @@ static StringRef getStrippedSourceFileName(const GlobalObject &GO) {
   return FileName;
 }
 
-// The PGO name has the format [<filepath>;]<function-name> where <filepath>; is
+// The PGO name has the format [<filepath>;]<mangled-name> where <filepath>; is
 // provided if linkage is local and is used to discriminate possibly identical
-// function names. ";" is used because it is unlikely to be found in either
-// <filepath> or <function-name>.
+// mangled names. ";" is used because it is unlikely to be found in either
+// <filepath> or <mangled-name>.
 //
 // Older compilers used getPGOFuncName() which has the format
-// [<filepath>:]<function-name>. This caused trouble for Objective-C functions
+// [<filepath>:]<mangled-name>. This caused trouble for Objective-C functions
 // which commonly have :'s in their names. We still need to compute this name to
 // lookup functions from profiles built by older compilers.
 static std::string



More information about the llvm-commits mailing list