[llvm] [clang] [PGO][nfc] Add `-fdiagnostics-show-profile-count` option to show real loop count from instr-profile (PR #75021)

via cfe-commits cfe-commits at lists.llvm.org
Sun Dec 10 19:22:38 PST 2023


llvmbot wrote:


<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-llvm-ir

Author: Elvis Wang (ElvisWang123)

<details>
<summary>Changes</summary>

The original `-fdiagnostics-show-hotness` option show the relative number of the loop count which is calculate by the `function_entry_count` and `branch_frequency`. We want to know the real loop iteration count in the remark which is collect in the instrument profile, adding a new option to expose the new feature.

- Add a new metadata `MD_prof_count` which contains the runtime loop iterations count. For example:
```
loop.header:
  ...
  br  i1  %0, label %true, label %false, !prof.count !0
...
!0 = !{!"profile_count", !i64 0}
```

- If option `-fdiagnostics-show-profile-count` is set we will append the `MD_prof_count` metadata at the branch instruction at the header of loops.
- Show the profile count like hotness with remark. For example:
```
  remark: the cost-model indicates that interleaving is not beneficial
  (ProfileCount: 20) [-Rpass-analysis=loop-vectorize]
  38 |   for(int i = 0;  i < argc % 20; i++){
     |   ^
```

---

Patch is 25.56 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/75021.diff


24 Files Affected:

- (modified) clang/docs/UsersManual.rst (+27) 
- (modified) clang/include/clang/Basic/DiagnosticDriverKinds.td (+3) 
- (modified) clang/include/clang/Driver/Options.td (+3) 
- (modified) clang/lib/CodeGen/CGStmt.cpp (+23-4) 
- (modified) clang/lib/CodeGen/CodeGenAction.cpp (+4) 
- (modified) clang/lib/CodeGen/CodeGenFunction.h (+1) 
- (modified) clang/lib/CodeGen/CodeGenPGO.cpp (+12) 
- (modified) clang/lib/Driver/ToolChains/Clang.cpp (+5) 
- (modified) clang/lib/Frontend/CompilerInvocation.cpp (+6) 
- (added) clang/test/Frontend/Inputs/optimization-remark-with-profile-count.proftext (+9) 
- (added) clang/test/Frontend/optimization-remark-with-profile-count-new-pm.c (+41) 
- (added) clang/test/Profile/Inputs/c-profile-count-metadata.proftext (+32) 
- (added) clang/test/Profile/c-profile-count-metadata.c (+73) 
- (modified) llvm/docs/LangRef.rst (+22) 
- (modified) llvm/include/llvm/Analysis/OptimizationRemarkEmitter.h (+7) 
- (modified) llvm/include/llvm/IR/DiagnosticInfo.h (+6) 
- (modified) llvm/include/llvm/IR/FixedMetadataKinds.def (+1) 
- (modified) llvm/include/llvm/IR/MDBuilder.h (+3) 
- (modified) llvm/include/llvm/Remarks/Remark.h (+4) 
- (modified) llvm/lib/Analysis/OptimizationRemarkEmitter.cpp (+22) 
- (modified) llvm/lib/IR/LLVMRemarkStreamer.cpp (+1) 
- (modified) llvm/lib/IR/MDBuilder.cpp (+10) 
- (modified) llvm/lib/Remarks/Remark.cpp (+2) 
- (modified) llvm/lib/Remarks/YAMLRemarkSerializer.cpp (+5-2) 


``````````diff
diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst
index f1b344ef5109b5..59e2bdbd025000 100644
--- a/clang/docs/UsersManual.rst
+++ b/clang/docs/UsersManual.rst
@@ -425,6 +425,33 @@ output format of the diagnostics that it generates.
    If this option is not used, all the passes are included in the optimization
    record.
 
+.. option:: -fdiagnostics-show-profile-count
+
+   Enable profile loop count information in diagnostic line.
+
+   This option controls whether Clang prints the profile loop count associated
+   with diagnostics in the presence of profile-guided optimization information.
+   This is currently supported with optimization remarks (see
+   :ref:`Options to Emit Optimization Reports <rpass>`). The profile count information
+   allows users to focus on the hot optimization remarks that are likely to be
+   more relevant for run-time performance. The main difference between profile count
+   the hotness is the profile count is the real profile count from the runtime
+   profile and hotness is a relative number calculated by function entry count and
+   weight.
+
+   For example, in this output, the block containing the callsite of `foo` was
+   executed 3000 times according to the profile data:
+
+   ::
+
+         s.c:7:10: remark: foo inlined into bar (ProfileCount: 3000) [-Rpass-analysis=inline]
+           sum += foo(x, x - 2);
+                  ^
+
+   This option is implied when
+   :ref:`-fsave-optimization-record <opt_fsave-optimization-record>` is used.
+   Otherwise, it defaults to off.
+
 .. _opt_fdiagnostics-show-hotness:
 
 .. option:: -f[no-]diagnostics-show-hotness
diff --git a/clang/include/clang/Basic/DiagnosticDriverKinds.td b/clang/include/clang/Basic/DiagnosticDriverKinds.td
index 676f1a62b49dd0..47ad1e058a1d82 100644
--- a/clang/include/clang/Basic/DiagnosticDriverKinds.td
+++ b/clang/include/clang/Basic/DiagnosticDriverKinds.td
@@ -420,6 +420,9 @@ def warn_drv_empty_joined_argument : Warning<
 def warn_drv_diagnostics_hotness_requires_pgo : Warning<
   "argument '%0' requires profile-guided optimization information">,
   InGroup<UnusedCommandLineArgument>;
+def warn_drv_diagnostics_profile_count_requires_pgo : Warning<
+  "argument '%0' requires profile-guided optimization information">,
+  InGroup<UnusedCommandLineArgument>;
 def warn_drv_diagnostics_misexpect_requires_pgo : Warning<
   "argument '%0' requires profile-guided optimization information">,
   InGroup<UnusedCommandLineArgument>;
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index b959fd20fe413d..78914e88350a5f 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -1933,6 +1933,9 @@ defm diagnostics_show_hotness : BoolFOption<"diagnostics-show-hotness",
   PosFlag<SetTrue, [], [ClangOption, CC1Option],
           "Enable profile hotness information in diagnostic line">,
   NegFlag<SetFalse>>;
+def fdiagnostics_show_profile_count : Flag<["-"], "fdiagnostics-show-profile-count">,
+    Group<f_clang_Group>,  Visibility<[ClangOption, CC1Option]>,
+    HelpText<"Show the real loop counts from the runtime profile">;
 def fdiagnostics_hotness_threshold_EQ : Joined<["-"], "fdiagnostics-hotness-threshold=">,
     Group<f_Group>, Visibility<[ClangOption, CC1Option]>,
     MetaVarName<"<value>">,
diff --git a/clang/lib/CodeGen/CGStmt.cpp b/clang/lib/CodeGen/CGStmt.cpp
index a5cb80640641bb..7cbb6e10cb1382 100644
--- a/clang/lib/CodeGen/CGStmt.cpp
+++ b/clang/lib/CodeGen/CGStmt.cpp
@@ -32,6 +32,7 @@
 #include "llvm/IR/DataLayout.h"
 #include "llvm/IR/InlineAsm.h"
 #include "llvm/IR/Intrinsics.h"
+#include "llvm/IR/LLVMContext.h"
 #include "llvm/IR/MDBuilder.h"
 #include "llvm/Support/SaveAndRestore.h"
 #include <optional>
@@ -923,7 +924,12 @@ void CodeGenFunction::EmitWhileStmt(const WhileStmt &S,
     if (!Weights && CGM.getCodeGenOpts().OptimizationLevel)
       BoolCondVal = emitCondLikelihoodViaExpectIntrinsic(
           BoolCondVal, Stmt::getLikelihood(S.getBody()));
-    Builder.CreateCondBr(BoolCondVal, LoopBody, ExitBlock, Weights);
+    auto *Branch =
+        Builder.CreateCondBr(BoolCondVal, LoopBody, ExitBlock, Weights);
+    // Appending the profle count metadata on the Branch instruction for the
+    // profile count
+    Branch->setMetadata(llvm::LLVMContext::MD_prof_count,
+                        createProfileCount(getProfileCount(S.getBody())));
 
     if (ExitBlock != LoopExit.getBlock()) {
       EmitBlock(ExitBlock);
@@ -1014,9 +1020,13 @@ void CodeGenFunction::EmitDoStmt(const DoStmt &S,
   // As long as the condition is true, iterate the loop.
   if (EmitBoolCondBranch) {
     uint64_t BackedgeCount = getProfileCount(S.getBody()) - ParentCount;
-    Builder.CreateCondBr(
+    auto *Branch = Builder.CreateCondBr(
         BoolCondVal, LoopBody, LoopExit.getBlock(),
         createProfileWeightsForLoop(S.getCond(), BackedgeCount));
+    // Appending the profile count metadata on the Branch instruction for the
+    // profile count
+    Branch->setMetadata(llvm::LLVMContext::MD_prof_count,
+                        createProfileCount(getProfileCount(S.getBody())));
   }
 
   LoopStack.pop();
@@ -1104,7 +1114,12 @@ void CodeGenFunction::EmitForStmt(const ForStmt &S,
       BoolCondVal = emitCondLikelihoodViaExpectIntrinsic(
           BoolCondVal, Stmt::getLikelihood(S.getBody()));
 
-    Builder.CreateCondBr(BoolCondVal, ForBody, ExitBlock, Weights);
+    auto *Branch =
+        Builder.CreateCondBr(BoolCondVal, ForBody, ExitBlock, Weights);
+    // Appending the profile count metadata on the Branch instruction for the
+    // profile count
+    Branch->setMetadata(llvm::LLVMContext::MD_prof_count,
+                        createProfileCount(getProfileCount(S.getBody())));
 
     if (ExitBlock != LoopExit.getBlock()) {
       EmitBlock(ExitBlock);
@@ -1188,7 +1203,11 @@ CodeGenFunction::EmitCXXForRangeStmt(const CXXForRangeStmt &S,
   if (!Weights && CGM.getCodeGenOpts().OptimizationLevel)
     BoolCondVal = emitCondLikelihoodViaExpectIntrinsic(
         BoolCondVal, Stmt::getLikelihood(S.getBody()));
-  Builder.CreateCondBr(BoolCondVal, ForBody, ExitBlock, Weights);
+  auto *Branch = Builder.CreateCondBr(BoolCondVal, ForBody, ExitBlock, Weights);
+  // Appending the profile count metadata  on the Branch instruction for the
+  // profile count
+  Branch->setMetadata(llvm::LLVMContext::MD_prof_count,
+                      createProfileCount(getProfileCount(S.getBody())));
 
   if (ExitBlock != LoopExit.getBlock()) {
     EmitBlock(ExitBlock);
diff --git a/clang/lib/CodeGen/CodeGenAction.cpp b/clang/lib/CodeGen/CodeGenAction.cpp
index bb6b1a3bc228cf..fbbd5bf25898e4 100644
--- a/clang/lib/CodeGen/CodeGenAction.cpp
+++ b/clang/lib/CodeGen/CodeGenAction.cpp
@@ -716,6 +716,10 @@ void BackendConsumer::EmitOptimizationMessage(
   if (D.getHotness())
     MsgStream << " (hotness: " << *D.getHotness() << ")";
 
+  if (D.getProfileCount()) {
+    MsgStream << " (ProfileCount: " << *D.getProfileCount() << ")";
+  }
+
   Diags.Report(Loc, DiagID)
       << AddFlagValue(D.getPassName())
       << MsgStream.str();
diff --git a/clang/lib/CodeGen/CodeGenFunction.h b/clang/lib/CodeGen/CodeGenFunction.h
index 618e78809db408..12cb0645f2f576 100644
--- a/clang/lib/CodeGen/CodeGenFunction.h
+++ b/clang/lib/CodeGen/CodeGenFunction.h
@@ -1527,6 +1527,7 @@ class CodeGenFunction : public CodeGenTypeCache {
   llvm::MDNode *createProfileWeights(ArrayRef<uint64_t> Weights) const;
   llvm::MDNode *createProfileWeightsForLoop(const Stmt *Cond,
                                             uint64_t LoopCount) const;
+  llvm::MDNode *createProfileCount(uint64_t Count) const;
 
 public:
   /// Increment the profiler's counter for the given statement by \p StepV.
diff --git a/clang/lib/CodeGen/CodeGenPGO.cpp b/clang/lib/CodeGen/CodeGenPGO.cpp
index 81bf8ea696b164..8752f8e8269b88 100644
--- a/clang/lib/CodeGen/CodeGenPGO.cpp
+++ b/clang/lib/CodeGen/CodeGenPGO.cpp
@@ -23,6 +23,11 @@
 #include "llvm/Support/MD5.h"
 #include <optional>
 
+static llvm::cl::opt<bool> ClEnableProfileCountMetadata(
+    "enable-profile-count-metadata",
+    llvm::cl::desc("Appending real executation count of loops from runtime"),
+    llvm::cl::Hidden, llvm::cl::init(false));
+
 static llvm::cl::opt<bool>
     EnableValueProfiling("enable-value-profiling",
                          llvm::cl::desc("Enable value profiling"),
@@ -1122,3 +1127,10 @@ CodeGenFunction::createProfileWeightsForLoop(const Stmt *Cond,
   return createProfileWeights(LoopCount,
                               std::max(*CondCount, LoopCount) - LoopCount);
 }
+
+llvm::MDNode *CodeGenFunction::createProfileCount(uint64_t Count) const {
+  if (!PGO.haveRegionCounts() || !ClEnableProfileCountMetadata)
+    return nullptr;
+  llvm::MDBuilder MDHelper(CGM.getLLVMContext());
+  return MDHelper.createProfileCount(Count);
+}
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index eb26bfade47b7a..0dfa40cf92f288 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -4065,6 +4065,11 @@ static void RenderDiagnosticsOptions(const Driver &D, const ArgList &Args,
 
   Args.addOptOutFlag(CmdArgs, options::OPT_fspell_checking,
                      options::OPT_fno_spell_checking);
+
+  // Show iteration counts of loops by runtime profile.
+  if (Args.hasArg(options::OPT_fdiagnostics_show_profile_count))
+    CmdArgs.append({"-fdiagnostics-show-profile-count", "-mllvm",
+                    "-enable-profile-count-metadata"});
 }
 
 DwarfFissionKind tools::getDebugFissionKind(const Driver &D,
diff --git a/clang/lib/Frontend/CompilerInvocation.cpp b/clang/lib/Frontend/CompilerInvocation.cpp
index b33bdad2ad81ba..11568eb2bdbd4d 100644
--- a/clang/lib/Frontend/CompilerInvocation.cpp
+++ b/clang/lib/Frontend/CompilerInvocation.cpp
@@ -2091,6 +2091,12 @@ bool CompilerInvocation::ParseCodeGenArgs(CodeGenOptions &Opts, ArgList &Args,
   bool UsingProfile =
       UsingSampleProfile || !Opts.ProfileInstrumentUsePath.empty();
 
+  if (Args.hasArg(options::OPT_fdiagnostics_show_profile_count) &&
+      !UsingProfile && IK.getLanguage() != Language::LLVM_IR) {
+    Diags.Report(diag::warn_drv_diagnostics_profile_count_requires_pgo)
+        << "-fdiagnostics-show-profile-count";
+  }
+
   if (Opts.DiagnosticsWithHotness && !UsingProfile &&
       // An IR file will contain PGO as metadata
       IK.getLanguage() != Language::LLVM_IR)
diff --git a/clang/test/Frontend/Inputs/optimization-remark-with-profile-count.proftext b/clang/test/Frontend/Inputs/optimization-remark-with-profile-count.proftext
new file mode 100644
index 00000000000000..1885f75326578a
--- /dev/null
+++ b/clang/test/Frontend/Inputs/optimization-remark-with-profile-count.proftext
@@ -0,0 +1,9 @@
+main
+# Func Hash:
+1160280
+# Num Counters:
+2
+# Counter Values:
+1
+20
+
diff --git a/clang/test/Frontend/optimization-remark-with-profile-count-new-pm.c b/clang/test/Frontend/optimization-remark-with-profile-count-new-pm.c
new file mode 100644
index 00000000000000..0cfe95a1a90d4c
--- /dev/null
+++ b/clang/test/Frontend/optimization-remark-with-profile-count-new-pm.c
@@ -0,0 +1,41 @@
+// Testing the remark output of the `-fdiagnostics-show-profile-count`.
+
+// Generate instrumentation and sampling profile data.
+// RUN: llvm-profdata merge \
+// RUN:     %S/Inputs/optimization-remark-with-profile-count.proftext \
+// RUN:     -o %t.profdata
+//
+// RUN: %clang  -fprofile-instr-use=%t.profdata \
+// RUN:     -O2 -Rpass=loop-vectorize -Rpass-analysis=loop-vectorize \
+// RUN:     -Rpass-missed=loop-vecotrize \
+// RUN:     -fdiagnostics-show-profile-count \
+// RUN:     2>&1 %s\
+// RUN:     | FileCheck -check-prefix=SHOW_PROFILE_COUNT %s
+// RUN: %clang  -fprofile-instr-use=%t.profdata \
+// RUN:     -O2 -Rpass=loop-vectorize -Rpass-analysis=loop-vectorize \
+// RUN:     -Rpass-missed=loop-vecotrize \
+// RUN:     -fdiagnostics-show-profile-count -fdiagnostics-show-hotness \
+// RUN:     2>&1 %s\
+// RUN:     | FileCheck -check-prefix=SHOW_PROFILE_COUNT_AND_HOTNESS %s
+// RUN: %clang  \
+// RUN:     -O2 -Rpass=loop-vectorize -Rpass-analysis=loop-vectorize \
+// RUN:     -Rpass-missed=loop-vecotrize \
+// RUN:     -fdiagnostics-show-profile-count  \
+// RUN:     2>&1 %s\
+// RUN:     | FileCheck -check-prefix=NO_PGO %s
+
+int sum = 0;
+int x[20] = {0, 112, 32, 11, 99, 88, 99, 88,34, 342, 85,99, 43, 75, 71, 871, 84, 65, 37, 98};
+
+// SHOW_PROFILE_COUNT_AND_HOTNESS: hotness: {{[0-9]+}}
+// SHOW_PROFILE_COUNT_AND_HOTNESS: ProfileCount: {{[0-9]+}}
+// SHOW_PROFILE_COUNT: ProfileCount: {{[0-9]+}}
+// NO_PGO: argument '-fdiagnostics-show-profile-count' requires profile-guided optimization information
+int main(int argc, const char *argv[]) {
+#pragma clang loop vectorize(enable)
+  for(int i = 0;  i < argc % 20; i++){
+    sum += x[i];
+    sum += argc;
+  }
+  return sum;
+}
diff --git a/clang/test/Profile/Inputs/c-profile-count-metadata.proftext b/clang/test/Profile/Inputs/c-profile-count-metadata.proftext
new file mode 100644
index 00000000000000..d880663fed32d9
--- /dev/null
+++ b/clang/test/Profile/Inputs/c-profile-count-metadata.proftext
@@ -0,0 +1,32 @@
+never_called
+6820425066224770721
+9
+0
+0
+0
+0
+0
+0
+0
+0
+0
+
+main
+24
+1
+1
+
+dead_code
+5254464978620792806
+10
+1
+0
+0
+0
+0
+0
+0
+0
+0
+0
+
diff --git a/clang/test/Profile/c-profile-count-metadata.c b/clang/test/Profile/c-profile-count-metadata.c
new file mode 100644
index 00000000000000..99089cd5da3600
--- /dev/null
+++ b/clang/test/Profile/c-profile-count-metadata.c
@@ -0,0 +1,73 @@
+// Copy from c-unprofiled-blocks.c but testing the `MD_prof_count`, which will
+// will generate the MD node no matter the code is dead or not.
+
+// RUN: llvm-profdata merge %S/Inputs/c-profile-count-metadata.proftext -o %t.profdata
+// RUN: %clang_cc1 -mllvm -enable-profile-count-metadata -triple x86_64-apple-macosx10.9 \
+// RUN: -main-file-name c-profile-count-metadata.c %s -o - \
+// RUN: -emit-llvm -fprofile-instrument-use-path=%t.profdata | FileCheck -check-prefix=PGOUSE %s
+
+// PGOUSE-LABEL: @never_called(i32 noundef %i)
+int never_called(int i) {
+  // PGOUSE: br i1 %{{[^,]*}}, label %{{[^,]*}}, label %{{[^,]*}}{{$}}
+  if (i) {}
+
+  // PGOUSE: br i1 %{{[^,]*}}, label %{{[^,]*}}, label %{{[^,]*}}, !prof.count !{{[0-9]+}}{{$}}
+  for (i = 0; i < 100; ++i) {
+  }
+
+  // PGOUSE: br i1 %{{[^,]*}}, label %{{[^,]*}}, label %{{[^,]*}}, !prof.count !{{[0-9]+}}{{$}}
+  while (--i) {}
+
+  // PGOUSE: br i1 %{{[^,]*}}, label %{{[^,]*}}, label %{{[^,]*}}, !llvm.loop [[LOOP1:!.*]], !prof.count !{{[0-9]+}}
+  do {} while (i++ < 75);
+
+  // PGOUSE: switch {{.*}} [
+  // PGOUSE-NEXT: i32 12
+  // PGOUSE-NEXT: i32 82
+  // PGOUSE-NEXT: ]{{$}}
+  switch (i) {
+  case 12: return 3;
+  case 82: return 0;
+  default: return 89;
+  }
+}
+
+// PGOUSE-LABEL: @dead_code(i32 noundef %i)
+int dead_code(int i) {
+  // PGOUSE: br {{.*}}, !prof !{{[0-9]+}}
+  if (i) {
+    // This branch is never reached.
+
+    // PGOUSE: br i1 %{{[^,]*}}, label %{{[^,]*}}, label %{{[^,]*}}{{$}}
+    if (!i) {}
+
+    // PGOUSE: br i1 %{{[^,]*}}, label %{{[^,]*}}, label %{{[^,]*}}, !prof.count !{{[0-9]+}}{{$}}
+    for (i = 0; i < 100; ++i) {
+    }
+
+    // PGOUSE: br i1 %{{[^,]*}}, label %{{[^,]*}}, label %{{[^,]*}}, !prof.count !{{[0-9]+}}{{$}}
+    while (--i) {}
+
+    // PGOUSE: br i1 %{{[^,]*}}, label %{{[^,]*}}, label %{{[^,]*}}, !llvm.loop [[LOOP2:!.*]], !prof.count !{{[0-9]+}}
+    do {} while (i++ < 75);
+
+    // PGOUSE: switch {{.*}} [
+    // PGOUSE-NEXT: i32 12
+    // PGOUSE-NEXT: i32 82
+    // PGOUSE-NEXT: ]{{$}}
+    switch (i) {
+    case 12: return 3;
+    case 82: return 0;
+    default: return 89;
+    }
+  }
+  return 2;
+}
+
+// PGOUSE-LABEL: @main(i32 noundef %argc, ptr noundef %argv)
+int main(int argc, const char *argv[]) {
+  dead_code(0);
+  return 0;
+}
+
+// PGOUSE: !{{[0-9]+}} = !{!"profile_count", i64 0}
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index adda52b33c789b..7bf7f0728f1879 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -7649,6 +7649,28 @@ allocation itself) to the outermost callsite context required for uniquely
 identifying the described profile behavior (note this may not be the top of
 the profiled call stack).
 
+.. _md_prof_count:
+
+'``prof.count``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+The ``prof.count`` metadata is used to record runtime profile data on loop
+executation times. The difference between ``prof`` metadata is this metadata record
+the real execution time in the runtime profile.
+
+Example:
+
+.. code-block:: text
+
+    for.header:
+      ...
+      br i1 %1, label %for.body, label %for.exit, !prof.count !0
+    !0 = !{!"profile_count", !64 100}
+
+Each of the  ``prof.count`` metadata contains two element, the first element is the
+string indicate this metadata is for profile_count, and the second element is the
+executation times current loop runs at the runtime.
+
 .. _md_callsite:
 
 '``callsite``' Metadata
diff --git a/llvm/include/llvm/Analysis/OptimizationRemarkEmitter.h b/llvm/include/llvm/Analysis/OptimizationRemarkEmitter.h
index 8aaeaf29910293..e7daf72502877d 100644
--- a/llvm/include/llvm/Analysis/OptimizationRemarkEmitter.h
+++ b/llvm/include/llvm/Analysis/OptimizationRemarkEmitter.h
@@ -118,6 +118,13 @@ class OptimizationRemarkEmitter {
   /// available.
   std::optional<uint64_t> computeHotness(const Value *V);
 
+  /// Compute profile count from IR value (currently assumed to be a block) if
+  /// PGO is available.
+  std::optional<uint64_t> getProfileCount(const Value *V);
+
+  /// Similar but use value from \p OptDiag and update profile count there.
+  void getProfileCount(DiagnosticInfoIROptimization &OptDiag);
+
   /// Similar but use value from \p OptDiag and update hotness there.
   void computeHotness(DiagnosticInfoIROptimization &OptDiag);
 
diff --git a/llvm/include/llvm/IR/DiagnosticInfo.h b/llvm/include/llvm/IR/DiagnosticInfo.h
index 628445fe9fb2cc..125f9302b75edc 100644
--- a/llvm/include/llvm/IR/DiagnosticInfo.h
+++ b/llvm/include/llvm/IR/DiagnosticInfo.h
@@ -481,6 +481,8 @@ class DiagnosticInfoOptimizationBase : public DiagnosticInfoWithLocationBase {
   std::string getMsg() const;
   std::optional<uint64_t> getHotness() const { return Hotness; }
   void setHotness(std::optional<uint64_t> H) { Hotness = H; }
+  std::optional<uint64_t> getProfileCount() const { return ProfileCount; }
+  void setProfileCount(std::optional<uint64_t> Count) { ProfileCount = Count; }
 
   bool isVerbose() const { return IsVerbose; }
 
@@ -523,6 +525,10 @@ class DiagnosticInfoOptimizationBase : public DiagnosticInfoWithLocationBase {
   /// corresponding code was executed in a profile instrumentation run.
   std::optional<uint64_t> Hotness;
 
+  /// If profile information is available, this is the REAL number of times the
+  /// corresponding code was executed in a profile instrumentation run.
+  std::optional<uint64_t> ProfileCount;
+
   /// Arguments collected via the streaming interface.
   SmallVector<Argument, 4> Args;
 
diff --git a/llvm/include/llvm/IR/FixedMetadataKinds.def b/llvm/include/llvm/IR/FixedMetadataKinds.def
index b375d0f0912060..20abd5f19c8e3a 100644
--- a/llvm/include/llvm/IR/FixedMetadataKinds.def
+++ b/llvm/include/llvm/IR/FixedMetadataKinds.def
@@ -51,3 +51,4 @@ LLVM_FIXED_MD_KIND(MD_kcfi_type, "kcfi_type", 36)
 LLVM_FIXED_MD_KIND(MD_pcsections, "pcsections", 37)
 LLVM_FIXED_MD_KIND(MD_DIAssignID, "DIAssignID", 38)
 LLVM_FIXED_MD_KIND(MD_coro_outside_frame, "coro.outside.frame", 39)
+LLVM_FIXED_MD_KIND(MD_prof_count, "prof.count", 40)
diff --git a/llvm/include/llvm/IR/MDBuilder.h b/llvm/include/llvm/IR/MDBuilder.h
index 39165453de16b0..adfc733baa67b1 100644
--- a/llvm/include/llvm/IR/MDBuilder.h
+++ b/llvm/include/llvm/IR/MDBuilder.h
@@ -67,6 +67,9 @@ class MDBuilder {
   /// Return metadata specifying that a branch or switch is unpredictable.
   MDNode *createUnpredictable();
 
+  /// Create the `profile_count` metadata at the branch instrucion
+  MDNode *createProfileCount(uint64_t Count);
+
   /// Return metadata containing the entry \p Count for a function, a boolean
   /// \Synthetic indicating whether the counts were synthetized, and the
   /// GUIDs stored in \p Imports that need to be imported for sample PGO, to
diff --git a/llvm/include/llvm/Remarks/Remark.h b/llvm/include/llvm/Remarks/Remark.h
in...
[truncated]

``````````

</details>


https://github.com/llvm/llvm-project/pull/75021


More information about the cfe-commits mailing list