[llvm] [clang] [PGO][nfc] Add `-fdiagnostics-show-profile-count` option to show real loop count from instr-profile (PR #75021)

Elvis Wang via cfe-commits cfe-commits at lists.llvm.org
Sun Dec 10 19:22:11 PST 2023


https://github.com/ElvisWang123 created https://github.com/llvm/llvm-project/pull/75021

The original `-fdiagnostics-show-hotness` option show the relative number of the loop count which is calculate by the `function_entry_count` and `branch_frequency`. We want to know the real loop iteration count in the remark which is collect in the instrument profile, adding a new option to expose the new feature.

- Add a new metadata `MD_prof_count` which contains the runtime loop iterations count. For example:
```
loop.header:
  ...
  br  i1  %0, label %true, label %false, !prof.count !0
...
!0 = !{!"profile_count", !i64 0}
```

- If option `-fdiagnostics-show-profile-count` is set we will append the `MD_prof_count` metadata at the branch instruction at the header of loops.
- Show the profile count like hotness with remark. For example:
```
  remark: the cost-model indicates that interleaving is not beneficial
  (ProfileCount: 20) [-Rpass-analysis=loop-vectorize]
  38 |   for(int i = 0;  i < argc % 20; i++){
     |   ^
```

>From e61b9c266b8b9fea46340e13fa18937e02b3756e Mon Sep 17 00:00:00 2001
From: Elvis Wang <elvis.wang at sifive.com>
Date: Sun, 10 Dec 2023 18:34:37 -0800
Subject: [PATCH] [PGO][nfc] Add `-fdiagnostics-show-profile-count` option to
 show real loop count from instr-profile

The original `-fdiagnostics-show-hotness` option show the relative
number of the loop count which is calculate by the
`function_entry_count` and `branch_frequency`. We want to know the real
loop iteration count in the remark which is collect in the instrument
profile, adding a new option to expose the new feature.

- Add a new metadata `MD_prof_count` which contains the runtime loop
iterations count. For example:
```
loop.header:
  ...
  br  i1  %0, label %true, label %false, !prof.count !0
...
!0 = !{!"profile_count", !i64 0}
```

- If option `-fdiagnostics-show-profile-count` is set we will append the
`MD_prof_count` metadata at the branch instruction at the header of
loops.
- Show the profile count like hotness with remark. For example:
```
  remark: the cost-model indicates that interleaving is not beneficial
  (ProfileCount: 20) [-Rpass-analysis=loop-vectorize]
  38 |   for(int i = 0;  i < argc % 20; i++){
     |   ^
```
---
 clang/docs/UsersManual.rst                    | 27 +++++++
 .../clang/Basic/DiagnosticDriverKinds.td      |  3 +
 clang/include/clang/Driver/Options.td         |  3 +
 clang/lib/CodeGen/CGStmt.cpp                  | 27 ++++++-
 clang/lib/CodeGen/CodeGenAction.cpp           |  4 +
 clang/lib/CodeGen/CodeGenFunction.h           |  1 +
 clang/lib/CodeGen/CodeGenPGO.cpp              | 12 +++
 clang/lib/Driver/ToolChains/Clang.cpp         |  5 ++
 clang/lib/Frontend/CompilerInvocation.cpp     |  6 ++
 ...ization-remark-with-profile-count.proftext |  9 +++
 ...ization-remark-with-profile-count-new-pm.c | 41 +++++++++++
 .../Inputs/c-profile-count-metadata.proftext  | 32 ++++++++
 clang/test/Profile/c-profile-count-metadata.c | 73 +++++++++++++++++++
 llvm/docs/LangRef.rst                         | 22 ++++++
 .../llvm/Analysis/OptimizationRemarkEmitter.h |  7 ++
 llvm/include/llvm/IR/DiagnosticInfo.h         |  6 ++
 llvm/include/llvm/IR/FixedMetadataKinds.def   |  1 +
 llvm/include/llvm/IR/MDBuilder.h              |  3 +
 llvm/include/llvm/Remarks/Remark.h            |  4 +
 .../Analysis/OptimizationRemarkEmitter.cpp    | 22 ++++++
 llvm/lib/IR/LLVMRemarkStreamer.cpp            |  1 +
 llvm/lib/IR/MDBuilder.cpp                     | 10 +++
 llvm/lib/Remarks/Remark.cpp                   |  2 +
 llvm/lib/Remarks/YAMLRemarkSerializer.cpp     |  7 +-
 24 files changed, 322 insertions(+), 6 deletions(-)
 create mode 100644 clang/test/Frontend/Inputs/optimization-remark-with-profile-count.proftext
 create mode 100644 clang/test/Frontend/optimization-remark-with-profile-count-new-pm.c
 create mode 100644 clang/test/Profile/Inputs/c-profile-count-metadata.proftext
 create mode 100644 clang/test/Profile/c-profile-count-metadata.c

diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst
index f1b344ef5109b..59e2bdbd02500 100644
--- a/clang/docs/UsersManual.rst
+++ b/clang/docs/UsersManual.rst
@@ -425,6 +425,33 @@ output format of the diagnostics that it generates.
    If this option is not used, all the passes are included in the optimization
    record.
 
+.. option:: -fdiagnostics-show-profile-count
+
+   Enable profile loop count information in diagnostic line.
+
+   This option controls whether Clang prints the profile loop count associated
+   with diagnostics in the presence of profile-guided optimization information.
+   This is currently supported with optimization remarks (see
+   :ref:`Options to Emit Optimization Reports <rpass>`). The profile count information
+   allows users to focus on the hot optimization remarks that are likely to be
+   more relevant for run-time performance. The main difference between profile count
+   the hotness is the profile count is the real profile count from the runtime
+   profile and hotness is a relative number calculated by function entry count and
+   weight.
+
+   For example, in this output, the block containing the callsite of `foo` was
+   executed 3000 times according to the profile data:
+
+   ::
+
+         s.c:7:10: remark: foo inlined into bar (ProfileCount: 3000) [-Rpass-analysis=inline]
+           sum += foo(x, x - 2);
+                  ^
+
+   This option is implied when
+   :ref:`-fsave-optimization-record <opt_fsave-optimization-record>` is used.
+   Otherwise, it defaults to off.
+
 .. _opt_fdiagnostics-show-hotness:
 
 .. option:: -f[no-]diagnostics-show-hotness
diff --git a/clang/include/clang/Basic/DiagnosticDriverKinds.td b/clang/include/clang/Basic/DiagnosticDriverKinds.td
index 676f1a62b49dd..47ad1e058a1d8 100644
--- a/clang/include/clang/Basic/DiagnosticDriverKinds.td
+++ b/clang/include/clang/Basic/DiagnosticDriverKinds.td
@@ -420,6 +420,9 @@ def warn_drv_empty_joined_argument : Warning<
 def warn_drv_diagnostics_hotness_requires_pgo : Warning<
   "argument '%0' requires profile-guided optimization information">,
   InGroup<UnusedCommandLineArgument>;
+def warn_drv_diagnostics_profile_count_requires_pgo : Warning<
+  "argument '%0' requires profile-guided optimization information">,
+  InGroup<UnusedCommandLineArgument>;
 def warn_drv_diagnostics_misexpect_requires_pgo : Warning<
   "argument '%0' requires profile-guided optimization information">,
   InGroup<UnusedCommandLineArgument>;
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index b959fd20fe413..78914e88350a5 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -1933,6 +1933,9 @@ defm diagnostics_show_hotness : BoolFOption<"diagnostics-show-hotness",
   PosFlag<SetTrue, [], [ClangOption, CC1Option],
           "Enable profile hotness information in diagnostic line">,
   NegFlag<SetFalse>>;
+def fdiagnostics_show_profile_count : Flag<["-"], "fdiagnostics-show-profile-count">,
+    Group<f_clang_Group>,  Visibility<[ClangOption, CC1Option]>,
+    HelpText<"Show the real loop counts from the runtime profile">;
 def fdiagnostics_hotness_threshold_EQ : Joined<["-"], "fdiagnostics-hotness-threshold=">,
     Group<f_Group>, Visibility<[ClangOption, CC1Option]>,
     MetaVarName<"<value>">,
diff --git a/clang/lib/CodeGen/CGStmt.cpp b/clang/lib/CodeGen/CGStmt.cpp
index a5cb80640641b..7cbb6e10cb138 100644
--- a/clang/lib/CodeGen/CGStmt.cpp
+++ b/clang/lib/CodeGen/CGStmt.cpp
@@ -32,6 +32,7 @@
 #include "llvm/IR/DataLayout.h"
 #include "llvm/IR/InlineAsm.h"
 #include "llvm/IR/Intrinsics.h"
+#include "llvm/IR/LLVMContext.h"
 #include "llvm/IR/MDBuilder.h"
 #include "llvm/Support/SaveAndRestore.h"
 #include <optional>
@@ -923,7 +924,12 @@ void CodeGenFunction::EmitWhileStmt(const WhileStmt &S,
     if (!Weights && CGM.getCodeGenOpts().OptimizationLevel)
       BoolCondVal = emitCondLikelihoodViaExpectIntrinsic(
           BoolCondVal, Stmt::getLikelihood(S.getBody()));
-    Builder.CreateCondBr(BoolCondVal, LoopBody, ExitBlock, Weights);
+    auto *Branch =
+        Builder.CreateCondBr(BoolCondVal, LoopBody, ExitBlock, Weights);
+    // Appending the profle count metadata on the Branch instruction for the
+    // profile count
+    Branch->setMetadata(llvm::LLVMContext::MD_prof_count,
+                        createProfileCount(getProfileCount(S.getBody())));
 
     if (ExitBlock != LoopExit.getBlock()) {
       EmitBlock(ExitBlock);
@@ -1014,9 +1020,13 @@ void CodeGenFunction::EmitDoStmt(const DoStmt &S,
   // As long as the condition is true, iterate the loop.
   if (EmitBoolCondBranch) {
     uint64_t BackedgeCount = getProfileCount(S.getBody()) - ParentCount;
-    Builder.CreateCondBr(
+    auto *Branch = Builder.CreateCondBr(
         BoolCondVal, LoopBody, LoopExit.getBlock(),
         createProfileWeightsForLoop(S.getCond(), BackedgeCount));
+    // Appending the profile count metadata on the Branch instruction for the
+    // profile count
+    Branch->setMetadata(llvm::LLVMContext::MD_prof_count,
+                        createProfileCount(getProfileCount(S.getBody())));
   }
 
   LoopStack.pop();
@@ -1104,7 +1114,12 @@ void CodeGenFunction::EmitForStmt(const ForStmt &S,
       BoolCondVal = emitCondLikelihoodViaExpectIntrinsic(
           BoolCondVal, Stmt::getLikelihood(S.getBody()));
 
-    Builder.CreateCondBr(BoolCondVal, ForBody, ExitBlock, Weights);
+    auto *Branch =
+        Builder.CreateCondBr(BoolCondVal, ForBody, ExitBlock, Weights);
+    // Appending the profile count metadata on the Branch instruction for the
+    // profile count
+    Branch->setMetadata(llvm::LLVMContext::MD_prof_count,
+                        createProfileCount(getProfileCount(S.getBody())));
 
     if (ExitBlock != LoopExit.getBlock()) {
       EmitBlock(ExitBlock);
@@ -1188,7 +1203,11 @@ CodeGenFunction::EmitCXXForRangeStmt(const CXXForRangeStmt &S,
   if (!Weights && CGM.getCodeGenOpts().OptimizationLevel)
     BoolCondVal = emitCondLikelihoodViaExpectIntrinsic(
         BoolCondVal, Stmt::getLikelihood(S.getBody()));
-  Builder.CreateCondBr(BoolCondVal, ForBody, ExitBlock, Weights);
+  auto *Branch = Builder.CreateCondBr(BoolCondVal, ForBody, ExitBlock, Weights);
+  // Appending the profile count metadata  on the Branch instruction for the
+  // profile count
+  Branch->setMetadata(llvm::LLVMContext::MD_prof_count,
+                      createProfileCount(getProfileCount(S.getBody())));
 
   if (ExitBlock != LoopExit.getBlock()) {
     EmitBlock(ExitBlock);
diff --git a/clang/lib/CodeGen/CodeGenAction.cpp b/clang/lib/CodeGen/CodeGenAction.cpp
index bb6b1a3bc228c..fbbd5bf25898e 100644
--- a/clang/lib/CodeGen/CodeGenAction.cpp
+++ b/clang/lib/CodeGen/CodeGenAction.cpp
@@ -716,6 +716,10 @@ void BackendConsumer::EmitOptimizationMessage(
   if (D.getHotness())
     MsgStream << " (hotness: " << *D.getHotness() << ")";
 
+  if (D.getProfileCount()) {
+    MsgStream << " (ProfileCount: " << *D.getProfileCount() << ")";
+  }
+
   Diags.Report(Loc, DiagID)
       << AddFlagValue(D.getPassName())
       << MsgStream.str();
diff --git a/clang/lib/CodeGen/CodeGenFunction.h b/clang/lib/CodeGen/CodeGenFunction.h
index 618e78809db40..12cb0645f2f57 100644
--- a/clang/lib/CodeGen/CodeGenFunction.h
+++ b/clang/lib/CodeGen/CodeGenFunction.h
@@ -1527,6 +1527,7 @@ class CodeGenFunction : public CodeGenTypeCache {
   llvm::MDNode *createProfileWeights(ArrayRef<uint64_t> Weights) const;
   llvm::MDNode *createProfileWeightsForLoop(const Stmt *Cond,
                                             uint64_t LoopCount) const;
+  llvm::MDNode *createProfileCount(uint64_t Count) const;
 
 public:
   /// Increment the profiler's counter for the given statement by \p StepV.
diff --git a/clang/lib/CodeGen/CodeGenPGO.cpp b/clang/lib/CodeGen/CodeGenPGO.cpp
index 81bf8ea696b16..8752f8e8269b8 100644
--- a/clang/lib/CodeGen/CodeGenPGO.cpp
+++ b/clang/lib/CodeGen/CodeGenPGO.cpp
@@ -23,6 +23,11 @@
 #include "llvm/Support/MD5.h"
 #include <optional>
 
+static llvm::cl::opt<bool> ClEnableProfileCountMetadata(
+    "enable-profile-count-metadata",
+    llvm::cl::desc("Appending real executation count of loops from runtime"),
+    llvm::cl::Hidden, llvm::cl::init(false));
+
 static llvm::cl::opt<bool>
     EnableValueProfiling("enable-value-profiling",
                          llvm::cl::desc("Enable value profiling"),
@@ -1122,3 +1127,10 @@ CodeGenFunction::createProfileWeightsForLoop(const Stmt *Cond,
   return createProfileWeights(LoopCount,
                               std::max(*CondCount, LoopCount) - LoopCount);
 }
+
+llvm::MDNode *CodeGenFunction::createProfileCount(uint64_t Count) const {
+  if (!PGO.haveRegionCounts() || !ClEnableProfileCountMetadata)
+    return nullptr;
+  llvm::MDBuilder MDHelper(CGM.getLLVMContext());
+  return MDHelper.createProfileCount(Count);
+}
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index eb26bfade47b7..0dfa40cf92f28 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -4065,6 +4065,11 @@ static void RenderDiagnosticsOptions(const Driver &D, const ArgList &Args,
 
   Args.addOptOutFlag(CmdArgs, options::OPT_fspell_checking,
                      options::OPT_fno_spell_checking);
+
+  // Show iteration counts of loops by runtime profile.
+  if (Args.hasArg(options::OPT_fdiagnostics_show_profile_count))
+    CmdArgs.append({"-fdiagnostics-show-profile-count", "-mllvm",
+                    "-enable-profile-count-metadata"});
 }
 
 DwarfFissionKind tools::getDebugFissionKind(const Driver &D,
diff --git a/clang/lib/Frontend/CompilerInvocation.cpp b/clang/lib/Frontend/CompilerInvocation.cpp
index b33bdad2ad81b..11568eb2bdbd4 100644
--- a/clang/lib/Frontend/CompilerInvocation.cpp
+++ b/clang/lib/Frontend/CompilerInvocation.cpp
@@ -2091,6 +2091,12 @@ bool CompilerInvocation::ParseCodeGenArgs(CodeGenOptions &Opts, ArgList &Args,
   bool UsingProfile =
       UsingSampleProfile || !Opts.ProfileInstrumentUsePath.empty();
 
+  if (Args.hasArg(options::OPT_fdiagnostics_show_profile_count) &&
+      !UsingProfile && IK.getLanguage() != Language::LLVM_IR) {
+    Diags.Report(diag::warn_drv_diagnostics_profile_count_requires_pgo)
+        << "-fdiagnostics-show-profile-count";
+  }
+
   if (Opts.DiagnosticsWithHotness && !UsingProfile &&
       // An IR file will contain PGO as metadata
       IK.getLanguage() != Language::LLVM_IR)
diff --git a/clang/test/Frontend/Inputs/optimization-remark-with-profile-count.proftext b/clang/test/Frontend/Inputs/optimization-remark-with-profile-count.proftext
new file mode 100644
index 0000000000000..1885f75326578
--- /dev/null
+++ b/clang/test/Frontend/Inputs/optimization-remark-with-profile-count.proftext
@@ -0,0 +1,9 @@
+main
+# Func Hash:
+1160280
+# Num Counters:
+2
+# Counter Values:
+1
+20
+
diff --git a/clang/test/Frontend/optimization-remark-with-profile-count-new-pm.c b/clang/test/Frontend/optimization-remark-with-profile-count-new-pm.c
new file mode 100644
index 0000000000000..0cfe95a1a90d4
--- /dev/null
+++ b/clang/test/Frontend/optimization-remark-with-profile-count-new-pm.c
@@ -0,0 +1,41 @@
+// Testing the remark output of the `-fdiagnostics-show-profile-count`.
+
+// Generate instrumentation and sampling profile data.
+// RUN: llvm-profdata merge \
+// RUN:     %S/Inputs/optimization-remark-with-profile-count.proftext \
+// RUN:     -o %t.profdata
+//
+// RUN: %clang  -fprofile-instr-use=%t.profdata \
+// RUN:     -O2 -Rpass=loop-vectorize -Rpass-analysis=loop-vectorize \
+// RUN:     -Rpass-missed=loop-vecotrize \
+// RUN:     -fdiagnostics-show-profile-count \
+// RUN:     2>&1 %s\
+// RUN:     | FileCheck -check-prefix=SHOW_PROFILE_COUNT %s
+// RUN: %clang  -fprofile-instr-use=%t.profdata \
+// RUN:     -O2 -Rpass=loop-vectorize -Rpass-analysis=loop-vectorize \
+// RUN:     -Rpass-missed=loop-vecotrize \
+// RUN:     -fdiagnostics-show-profile-count -fdiagnostics-show-hotness \
+// RUN:     2>&1 %s\
+// RUN:     | FileCheck -check-prefix=SHOW_PROFILE_COUNT_AND_HOTNESS %s
+// RUN: %clang  \
+// RUN:     -O2 -Rpass=loop-vectorize -Rpass-analysis=loop-vectorize \
+// RUN:     -Rpass-missed=loop-vecotrize \
+// RUN:     -fdiagnostics-show-profile-count  \
+// RUN:     2>&1 %s\
+// RUN:     | FileCheck -check-prefix=NO_PGO %s
+
+int sum = 0;
+int x[20] = {0, 112, 32, 11, 99, 88, 99, 88,34, 342, 85,99, 43, 75, 71, 871, 84, 65, 37, 98};
+
+// SHOW_PROFILE_COUNT_AND_HOTNESS: hotness: {{[0-9]+}}
+// SHOW_PROFILE_COUNT_AND_HOTNESS: ProfileCount: {{[0-9]+}}
+// SHOW_PROFILE_COUNT: ProfileCount: {{[0-9]+}}
+// NO_PGO: argument '-fdiagnostics-show-profile-count' requires profile-guided optimization information
+int main(int argc, const char *argv[]) {
+#pragma clang loop vectorize(enable)
+  for(int i = 0;  i < argc % 20; i++){
+    sum += x[i];
+    sum += argc;
+  }
+  return sum;
+}
diff --git a/clang/test/Profile/Inputs/c-profile-count-metadata.proftext b/clang/test/Profile/Inputs/c-profile-count-metadata.proftext
new file mode 100644
index 0000000000000..d880663fed32d
--- /dev/null
+++ b/clang/test/Profile/Inputs/c-profile-count-metadata.proftext
@@ -0,0 +1,32 @@
+never_called
+6820425066224770721
+9
+0
+0
+0
+0
+0
+0
+0
+0
+0
+
+main
+24
+1
+1
+
+dead_code
+5254464978620792806
+10
+1
+0
+0
+0
+0
+0
+0
+0
+0
+0
+
diff --git a/clang/test/Profile/c-profile-count-metadata.c b/clang/test/Profile/c-profile-count-metadata.c
new file mode 100644
index 0000000000000..99089cd5da360
--- /dev/null
+++ b/clang/test/Profile/c-profile-count-metadata.c
@@ -0,0 +1,73 @@
+// Copy from c-unprofiled-blocks.c but testing the `MD_prof_count`, which will
+// will generate the MD node no matter the code is dead or not.
+
+// RUN: llvm-profdata merge %S/Inputs/c-profile-count-metadata.proftext -o %t.profdata
+// RUN: %clang_cc1 -mllvm -enable-profile-count-metadata -triple x86_64-apple-macosx10.9 \
+// RUN: -main-file-name c-profile-count-metadata.c %s -o - \
+// RUN: -emit-llvm -fprofile-instrument-use-path=%t.profdata | FileCheck -check-prefix=PGOUSE %s
+
+// PGOUSE-LABEL: @never_called(i32 noundef %i)
+int never_called(int i) {
+  // PGOUSE: br i1 %{{[^,]*}}, label %{{[^,]*}}, label %{{[^,]*}}{{$}}
+  if (i) {}
+
+  // PGOUSE: br i1 %{{[^,]*}}, label %{{[^,]*}}, label %{{[^,]*}}, !prof.count !{{[0-9]+}}{{$}}
+  for (i = 0; i < 100; ++i) {
+  }
+
+  // PGOUSE: br i1 %{{[^,]*}}, label %{{[^,]*}}, label %{{[^,]*}}, !prof.count !{{[0-9]+}}{{$}}
+  while (--i) {}
+
+  // PGOUSE: br i1 %{{[^,]*}}, label %{{[^,]*}}, label %{{[^,]*}}, !llvm.loop [[LOOP1:!.*]], !prof.count !{{[0-9]+}}
+  do {} while (i++ < 75);
+
+  // PGOUSE: switch {{.*}} [
+  // PGOUSE-NEXT: i32 12
+  // PGOUSE-NEXT: i32 82
+  // PGOUSE-NEXT: ]{{$}}
+  switch (i) {
+  case 12: return 3;
+  case 82: return 0;
+  default: return 89;
+  }
+}
+
+// PGOUSE-LABEL: @dead_code(i32 noundef %i)
+int dead_code(int i) {
+  // PGOUSE: br {{.*}}, !prof !{{[0-9]+}}
+  if (i) {
+    // This branch is never reached.
+
+    // PGOUSE: br i1 %{{[^,]*}}, label %{{[^,]*}}, label %{{[^,]*}}{{$}}
+    if (!i) {}
+
+    // PGOUSE: br i1 %{{[^,]*}}, label %{{[^,]*}}, label %{{[^,]*}}, !prof.count !{{[0-9]+}}{{$}}
+    for (i = 0; i < 100; ++i) {
+    }
+
+    // PGOUSE: br i1 %{{[^,]*}}, label %{{[^,]*}}, label %{{[^,]*}}, !prof.count !{{[0-9]+}}{{$}}
+    while (--i) {}
+
+    // PGOUSE: br i1 %{{[^,]*}}, label %{{[^,]*}}, label %{{[^,]*}}, !llvm.loop [[LOOP2:!.*]], !prof.count !{{[0-9]+}}
+    do {} while (i++ < 75);
+
+    // PGOUSE: switch {{.*}} [
+    // PGOUSE-NEXT: i32 12
+    // PGOUSE-NEXT: i32 82
+    // PGOUSE-NEXT: ]{{$}}
+    switch (i) {
+    case 12: return 3;
+    case 82: return 0;
+    default: return 89;
+    }
+  }
+  return 2;
+}
+
+// PGOUSE-LABEL: @main(i32 noundef %argc, ptr noundef %argv)
+int main(int argc, const char *argv[]) {
+  dead_code(0);
+  return 0;
+}
+
+// PGOUSE: !{{[0-9]+}} = !{!"profile_count", i64 0}
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index adda52b33c789..7bf7f0728f187 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -7649,6 +7649,28 @@ allocation itself) to the outermost callsite context required for uniquely
 identifying the described profile behavior (note this may not be the top of
 the profiled call stack).
 
+.. _md_prof_count:
+
+'``prof.count``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+The ``prof.count`` metadata is used to record runtime profile data on loop
+executation times. The difference between ``prof`` metadata is this metadata record
+the real execution time in the runtime profile.
+
+Example:
+
+.. code-block:: text
+
+    for.header:
+      ...
+      br i1 %1, label %for.body, label %for.exit, !prof.count !0
+    !0 = !{!"profile_count", !64 100}
+
+Each of the  ``prof.count`` metadata contains two element, the first element is the
+string indicate this metadata is for profile_count, and the second element is the
+executation times current loop runs at the runtime.
+
 .. _md_callsite:
 
 '``callsite``' Metadata
diff --git a/llvm/include/llvm/Analysis/OptimizationRemarkEmitter.h b/llvm/include/llvm/Analysis/OptimizationRemarkEmitter.h
index 8aaeaf2991029..e7daf72502877 100644
--- a/llvm/include/llvm/Analysis/OptimizationRemarkEmitter.h
+++ b/llvm/include/llvm/Analysis/OptimizationRemarkEmitter.h
@@ -118,6 +118,13 @@ class OptimizationRemarkEmitter {
   /// available.
   std::optional<uint64_t> computeHotness(const Value *V);
 
+  /// Compute profile count from IR value (currently assumed to be a block) if
+  /// PGO is available.
+  std::optional<uint64_t> getProfileCount(const Value *V);
+
+  /// Similar but use value from \p OptDiag and update profile count there.
+  void getProfileCount(DiagnosticInfoIROptimization &OptDiag);
+
   /// Similar but use value from \p OptDiag and update hotness there.
   void computeHotness(DiagnosticInfoIROptimization &OptDiag);
 
diff --git a/llvm/include/llvm/IR/DiagnosticInfo.h b/llvm/include/llvm/IR/DiagnosticInfo.h
index 628445fe9fb2c..125f9302b75ed 100644
--- a/llvm/include/llvm/IR/DiagnosticInfo.h
+++ b/llvm/include/llvm/IR/DiagnosticInfo.h
@@ -481,6 +481,8 @@ class DiagnosticInfoOptimizationBase : public DiagnosticInfoWithLocationBase {
   std::string getMsg() const;
   std::optional<uint64_t> getHotness() const { return Hotness; }
   void setHotness(std::optional<uint64_t> H) { Hotness = H; }
+  std::optional<uint64_t> getProfileCount() const { return ProfileCount; }
+  void setProfileCount(std::optional<uint64_t> Count) { ProfileCount = Count; }
 
   bool isVerbose() const { return IsVerbose; }
 
@@ -523,6 +525,10 @@ class DiagnosticInfoOptimizationBase : public DiagnosticInfoWithLocationBase {
   /// corresponding code was executed in a profile instrumentation run.
   std::optional<uint64_t> Hotness;
 
+  /// If profile information is available, this is the REAL number of times the
+  /// corresponding code was executed in a profile instrumentation run.
+  std::optional<uint64_t> ProfileCount;
+
   /// Arguments collected via the streaming interface.
   SmallVector<Argument, 4> Args;
 
diff --git a/llvm/include/llvm/IR/FixedMetadataKinds.def b/llvm/include/llvm/IR/FixedMetadataKinds.def
index b375d0f091206..20abd5f19c8e3 100644
--- a/llvm/include/llvm/IR/FixedMetadataKinds.def
+++ b/llvm/include/llvm/IR/FixedMetadataKinds.def
@@ -51,3 +51,4 @@ LLVM_FIXED_MD_KIND(MD_kcfi_type, "kcfi_type", 36)
 LLVM_FIXED_MD_KIND(MD_pcsections, "pcsections", 37)
 LLVM_FIXED_MD_KIND(MD_DIAssignID, "DIAssignID", 38)
 LLVM_FIXED_MD_KIND(MD_coro_outside_frame, "coro.outside.frame", 39)
+LLVM_FIXED_MD_KIND(MD_prof_count, "prof.count", 40)
diff --git a/llvm/include/llvm/IR/MDBuilder.h b/llvm/include/llvm/IR/MDBuilder.h
index 39165453de16b..adfc733baa67b 100644
--- a/llvm/include/llvm/IR/MDBuilder.h
+++ b/llvm/include/llvm/IR/MDBuilder.h
@@ -67,6 +67,9 @@ class MDBuilder {
   /// Return metadata specifying that a branch or switch is unpredictable.
   MDNode *createUnpredictable();
 
+  /// Create the `profile_count` metadata at the branch instrucion
+  MDNode *createProfileCount(uint64_t Count);
+
   /// Return metadata containing the entry \p Count for a function, a boolean
   /// \Synthetic indicating whether the counts were synthetized, and the
   /// GUIDs stored in \p Imports that need to be imported for sample PGO, to
diff --git a/llvm/include/llvm/Remarks/Remark.h b/llvm/include/llvm/Remarks/Remark.h
index de81c5a992805..7bfbcb64b847f 100644
--- a/llvm/include/llvm/Remarks/Remark.h
+++ b/llvm/include/llvm/Remarks/Remark.h
@@ -116,6 +116,10 @@ struct Remark {
   /// corresponding code was executed in a profile instrumentation run.
   std::optional<uint64_t> Hotness;
 
+  /// If profile information is available, this is the number of real times the
+  /// corresponding code was executed in a profile instrumentation run.
+  std::optional<uint64_t> ProfileCount;
+
   /// Arguments collected via the streaming interface.
   SmallVector<Argument, 5> Args;
 
diff --git a/llvm/lib/Analysis/OptimizationRemarkEmitter.cpp b/llvm/lib/Analysis/OptimizationRemarkEmitter.cpp
index 95d55ea44dbfd..73afa43b53b63 100644
--- a/llvm/lib/Analysis/OptimizationRemarkEmitter.cpp
+++ b/llvm/lib/Analysis/OptimizationRemarkEmitter.cpp
@@ -16,6 +16,8 @@
 #include "llvm/Analysis/LazyBlockFrequencyInfo.h"
 #include "llvm/Analysis/LoopInfo.h"
 #include "llvm/Analysis/ProfileSummaryInfo.h"
+#include "llvm/IR/BasicBlock.h"
+#include "llvm/IR/Constants.h"
 #include "llvm/IR/DiagnosticInfo.h"
 #include "llvm/IR/Dominators.h"
 #include "llvm/IR/LLVMContext.h"
@@ -76,10 +78,30 @@ void OptimizationRemarkEmitter::computeHotness(
     OptDiag.setHotness(computeHotness(V));
 }
 
+void OptimizationRemarkEmitter::getProfileCount(
+    DiagnosticInfoIROptimization &OptDiag) {
+  const Value *V = OptDiag.getCodeRegion();
+  if (V)
+    OptDiag.setProfileCount(getProfileCount(V));
+}
+
+std::optional<uint64_t>
+OptimizationRemarkEmitter::getProfileCount(const Value *V) {
+  auto *Term = cast<BasicBlock>(V)->getTerminator();
+  auto *ProfileData = Term->getMetadata(LLVMContext::MD_prof_count);
+  // TODO: Add check if the metadata match 'profile_count'
+  if (ProfileData == nullptr)
+    return std::nullopt;
+  ValueAsMetadata *VM =
+      static_cast<llvm::ValueAsMetadata *>(ProfileData->getOperand(1).get());
+  return static_cast<llvm::ConstantInt *>(VM->getValue())->getZExtValue();
+}
+
 void OptimizationRemarkEmitter::emit(
     DiagnosticInfoOptimizationBase &OptDiagBase) {
   auto &OptDiag = cast<DiagnosticInfoIROptimization>(OptDiagBase);
   computeHotness(OptDiag);
+  getProfileCount(OptDiag);
 
   // Only emit it if its hotness meets the threshold.
   if (OptDiag.getHotness().value_or(0) <
diff --git a/llvm/lib/IR/LLVMRemarkStreamer.cpp b/llvm/lib/IR/LLVMRemarkStreamer.cpp
index 71f8d4a4b1c7c..d19e46bb12f86 100644
--- a/llvm/lib/IR/LLVMRemarkStreamer.cpp
+++ b/llvm/lib/IR/LLVMRemarkStreamer.cpp
@@ -67,6 +67,7 @@ LLVMRemarkStreamer::toRemark(const DiagnosticInfoOptimizationBase &Diag) const {
       GlobalValue::dropLLVMManglingEscape(Diag.getFunction().getName());
   R.Loc = toRemarkLocation(Diag.getLocation());
   R.Hotness = Diag.getHotness();
+  R.ProfileCount = Diag.getProfileCount();
 
   for (const DiagnosticInfoOptimizationBase::Argument &Arg : Diag.getArgs()) {
     R.Args.emplace_back();
diff --git a/llvm/lib/IR/MDBuilder.cpp b/llvm/lib/IR/MDBuilder.cpp
index 2490b3012bdc2..a31ea8159d44e 100644
--- a/llvm/lib/IR/MDBuilder.cpp
+++ b/llvm/lib/IR/MDBuilder.cpp
@@ -52,6 +52,16 @@ MDNode *MDBuilder::createBranchWeights(ArrayRef<uint32_t> Weights) {
   return MDNode::get(Context, Vals);
 }
 
+MDNode *MDBuilder::createProfileCount(uint64_t Count) {
+  SmallVector<Metadata *, 2> Vals(2);
+  Vals[0] = createString("profile_count");
+
+  Type *Int64Ty = Type::getInt64Ty(Context);
+  Vals[1] = createConstant(ConstantInt::get(Int64Ty, Count));
+
+  return MDNode::get(Context, Vals);
+}
+
 MDNode *MDBuilder::createUnpredictable() {
   return MDNode::get(Context, std::nullopt);
 }
diff --git a/llvm/lib/Remarks/Remark.cpp b/llvm/lib/Remarks/Remark.cpp
index ef42271a3c8da..7a5e47b4c5a1d 100644
--- a/llvm/lib/Remarks/Remark.cpp
+++ b/llvm/lib/Remarks/Remark.cpp
@@ -56,6 +56,8 @@ void Remark::print(raw_ostream &OS) const {
     OS << "Loc: " << Loc.value();
   if (Hotness)
     OS << "Hotness: " << Hotness;
+  if (ProfileCount)
+    OS << "ProfileCount: " << ProfileCount;
   if (!Args.empty()) {
     OS << "Args:\n";
     for (auto Arg : Args)
diff --git a/llvm/lib/Remarks/YAMLRemarkSerializer.cpp b/llvm/lib/Remarks/YAMLRemarkSerializer.cpp
index 68285c3dde1bf..f5cf73173bf92 100644
--- a/llvm/lib/Remarks/YAMLRemarkSerializer.cpp
+++ b/llvm/lib/Remarks/YAMLRemarkSerializer.cpp
@@ -25,12 +25,14 @@ template <typename T>
 static void mapRemarkHeader(yaml::IO &io, T PassName, T RemarkName,
                             std::optional<RemarkLocation> RL, T FunctionName,
                             std::optional<uint64_t> Hotness,
+                            std::optional<uint64_t> ProfileCount,
                             ArrayRef<Argument> Args) {
   io.mapRequired("Pass", PassName);
   io.mapRequired("Name", RemarkName);
   io.mapOptional("DebugLoc", RL);
   io.mapRequired("Function", FunctionName);
   io.mapOptional("Hotness", Hotness);
+  io.mapOptional("ProfileCount", ProfileCount);
   io.mapOptional("Args", Args);
 }
 
@@ -66,10 +68,11 @@ template <> struct MappingTraits<remarks::Remark *> {
       unsigned NameID = StrTab.add(Remark->RemarkName).first;
       unsigned FunctionID = StrTab.add(Remark->FunctionName).first;
       mapRemarkHeader(io, PassID, NameID, Remark->Loc, FunctionID,
-                      Remark->Hotness, Remark->Args);
+                      Remark->Hotness, Remark->ProfileCount, Remark->Args);
     } else {
       mapRemarkHeader(io, Remark->PassName, Remark->RemarkName, Remark->Loc,
-                      Remark->FunctionName, Remark->Hotness, Remark->Args);
+                      Remark->FunctionName, Remark->Hotness,
+                      Remark->ProfileCount, Remark->Args);
     }
   }
 };



More information about the cfe-commits mailing list