[clang] Improve documented sampling profiler steps to best known methods (PR #88438)

via cfe-commits cfe-commits at lists.llvm.org
Thu Apr 11 13:57:22 PDT 2024


llvmbot wrote:


<!--LLVM PR SUMMARY COMMENT-->

@llvm/pr-subscribers-clang

Author: Tim Creech (tcreech-intel)

<details>
<summary>Changes</summary>

1. Add `-fdebug-info-for-profiling -funique-internal-linkage-names`, which improve the usefulness of debug info for profiling.

2. Recommend the use of `br_inst_retired.near_taken:uppp`, which provides the most precise results on supporting hardware.  Mention `branches:u` as a more portable backup.

   Both should portray execution counts better than the default event (`cycles`) and have a better chance of working as an unprivileged user due to the `:u` modifier.

---
Full diff: https://github.com/llvm/llvm-project/pull/88438.diff


1 Files Affected:

- (modified) clang/docs/UsersManual.rst (+10-6) 


``````````diff
diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst
index c464bc3a69adc5..818841285cfae5 100644
--- a/clang/docs/UsersManual.rst
+++ b/clang/docs/UsersManual.rst
@@ -2443,13 +2443,15 @@ usual build cycle when using sample profilers for optimization:
    usual build flags that you always build your application with. The only
    requirement is that DWARF debug info including source line information is
    generated. This DWARF information is important for the profiler to be able
-   to map instructions back to source line locations.
+   to map instructions back to source line locations. The usefulness of this
+   DWARF information can be improved with the ``-fdebug-info-for-profiling``
+   and ``-funique-internal-linkage-names`` options.
 
-   On Linux, ``-g`` or just ``-gline-tables-only`` is sufficient:
+   On Linux:
 
    .. code-block:: console
 
-     $ clang++ -O2 -gline-tables-only code.cc -o code
+     $ clang++ -O2 -gline-tables-only -fdebug-info-for-profiling -funique-internal-linkage-names code.cc -o code
 
    While MSVC-style targets default to CodeView debug information, DWARF debug
    information is required to generate source-level LLVM profiles. Use
@@ -2457,13 +2459,13 @@ usual build cycle when using sample profilers for optimization:
 
    .. code-block:: console
 
-     $ clang-cl -O2 -gdwarf -gline-tables-only coff-profile.cpp -fuse-ld=lld -link -debug:dwarf
+     $ clang-cl -O2 -gdwarf -gline-tables-only /clang:-fdebug-info-for-profiling /clang:-funique-internal-linkage-names code.cc -o code -fuse-ld=lld -link -debug:dwarf
 
 2. Run the executable under a sampling profiler. The specific profiler
    you use does not really matter, as long as its output can be converted
    into the format that the LLVM optimizer understands.
 
-   Two such profilers are the the Linux Perf profiler
+   Two such profilers are the Linux Perf profiler
    (https://perf.wiki.kernel.org/) and Intel's Sampling Enabling Product (SEP),
    available as part of `Intel VTune
    <https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/vtune-profiler.html>`_.
@@ -2477,7 +2479,9 @@ usual build cycle when using sample profilers for optimization:
 
    .. code-block:: console
 
-     $ perf record -b ./code
+     $ perf record -b -e BR_INST_RETIRED.NEAR_TAKEN:uppp ./code
+
+   If the event above is unavailable, ``branches:u`` is probably next-best.
 
    Note the use of the ``-b`` flag. This tells Perf to use the Last Branch
    Record (LBR) to record call chains. While this is not strictly required,

``````````

</details>


https://github.com/llvm/llvm-project/pull/88438


More information about the cfe-commits mailing list