[clang] Improve documented sampling profiler steps to best known methods (PR #88438)
Tim Creech via cfe-commits
cfe-commits at lists.llvm.org
Sat Apr 27 14:12:11 PDT 2024
https://github.com/tcreech-intel updated https://github.com/llvm/llvm-project/pull/88438
>From fe3404cbdf78b434f16f8351dc242175b4543112 Mon Sep 17 00:00:00 2001
From: Tim Creech <timothy.m.creech at intel.com>
Date: Thu, 11 Apr 2024 16:03:52 -0400
Subject: [PATCH 1/4] Improve documented sampling profiler steps to best known
methods
1. Add `-fdebug-info-for-profiling -funique-internal-linkage-names`,
which improve the usefulness of debug info for profiling.
2. Recommend the use of `br_inst_retired.near_taken:uppp`, which
provides the most precise results on supporting hardware. Mention
`branches:u` as a more portable backup.
Both should portray execution counts better than the default event
(`cycles`) and have a better chance of working as an unprivileged
user due to the `:u` modifier.
---
clang/docs/UsersManual.rst | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)
diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst
index c464bc3a69adc5..818841285cfae5 100644
--- a/clang/docs/UsersManual.rst
+++ b/clang/docs/UsersManual.rst
@@ -2443,13 +2443,15 @@ usual build cycle when using sample profilers for optimization:
usual build flags that you always build your application with. The only
requirement is that DWARF debug info including source line information is
generated. This DWARF information is important for the profiler to be able
- to map instructions back to source line locations.
+ to map instructions back to source line locations. The usefulness of this
+ DWARF information can be improved with the ``-fdebug-info-for-profiling``
+ and ``-funique-internal-linkage-names`` options.
- On Linux, ``-g`` or just ``-gline-tables-only`` is sufficient:
+ On Linux:
.. code-block:: console
- $ clang++ -O2 -gline-tables-only code.cc -o code
+ $ clang++ -O2 -gline-tables-only -fdebug-info-for-profiling -funique-internal-linkage-names code.cc -o code
While MSVC-style targets default to CodeView debug information, DWARF debug
information is required to generate source-level LLVM profiles. Use
@@ -2457,13 +2459,13 @@ usual build cycle when using sample profilers for optimization:
.. code-block:: console
- $ clang-cl -O2 -gdwarf -gline-tables-only coff-profile.cpp -fuse-ld=lld -link -debug:dwarf
+ $ clang-cl -O2 -gdwarf -gline-tables-only /clang:-fdebug-info-for-profiling /clang:-funique-internal-linkage-names code.cc -o code -fuse-ld=lld -link -debug:dwarf
2. Run the executable under a sampling profiler. The specific profiler
you use does not really matter, as long as its output can be converted
into the format that the LLVM optimizer understands.
- Two such profilers are the the Linux Perf profiler
+ Two such profilers are the Linux Perf profiler
(https://perf.wiki.kernel.org/) and Intel's Sampling Enabling Product (SEP),
available as part of `Intel VTune
<https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/vtune-profiler.html>`_.
@@ -2477,7 +2479,9 @@ usual build cycle when using sample profilers for optimization:
.. code-block:: console
- $ perf record -b ./code
+ $ perf record -b -e BR_INST_RETIRED.NEAR_TAKEN:uppp ./code
+
+ If the event above is unavailable, ``branches:u`` is probably next-best.
Note the use of the ``-b`` flag. This tells Perf to use the Last Branch
Record (LBR) to record call chains. While this is not strictly required,
>From add91ec329f60eef6ecf79d6d5c9a548a8d6bcfe Mon Sep 17 00:00:00 2001
From: Tim Creech <timothy.m.creech at intel.com>
Date: Mon, 22 Apr 2024 11:11:36 -0400
Subject: [PATCH 2/4] fixup: add uniqueing note, match debug flags
---
clang/docs/UsersManual.rst | 27 ++++++++++++++++++++++-----
1 file changed, 22 insertions(+), 5 deletions(-)
diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst
index 818841285cfae5..b87fc7f2aaa4dd 100644
--- a/clang/docs/UsersManual.rst
+++ b/clang/docs/UsersManual.rst
@@ -2314,6 +2314,8 @@ are listed below.
on ELF targets when using the integrated assembler. This flag currently
only has an effect on ELF targets.
+.. _funique_internal_linkage_names:
+
.. option:: -f[no]-unique-internal-linkage-names
Controls whether Clang emits a unique (best-effort) symbol name for internal
@@ -2451,15 +2453,27 @@ usual build cycle when using sample profilers for optimization:
.. code-block:: console
- $ clang++ -O2 -gline-tables-only -fdebug-info-for-profiling -funique-internal-linkage-names code.cc -o code
+ $ clang++ -O2 -gline-tables-only \
+ -fdebug-info-for-profiling -funique-internal-linkage-names \
+ code.cc -o code
While MSVC-style targets default to CodeView debug information, DWARF debug
information is required to generate source-level LLVM profiles. Use
``-gdwarf`` to include DWARF debug information:
- .. code-block:: console
+ .. code-block:: winbatch
+
+ $ clang-cl -O2 -gdwarf -gline-tables-only ^
+ /clang:-fdebug-info-for-profiling /clang:-funique-internal-linkage-names ^
+ code.cc -o code -fuse-ld=lld -link -debug:dwarf
+
+.. note::
- $ clang-cl -O2 -gdwarf -gline-tables-only /clang:-fdebug-info-for-profiling /clang:-funique-internal-linkage-names code.cc -o code -fuse-ld=lld -link -debug:dwarf
+ :ref:`-funique-internal-linkage-names <funique_internal_linkage_names>`
+ generates unique names based on given command-line source file paths. If
+ your build system uses absolute source paths and these paths may change
+ between steps 1 and 4, then the uniqued function names may change and result
+ in unused profile data. Consider omitting this option in such cases.
2. Run the executable under a sampling profiler. The specific profiler
you use does not really matter, as long as its output can be converted
@@ -2531,11 +2545,13 @@ usual build cycle when using sample profilers for optimization:
that executes faster than the original one. Note that you are not
required to build the code with the exact same arguments that you
used in the first step. The only requirement is that you build the code
- with ``-gline-tables-only`` and ``-fprofile-sample-use``.
+ with the same debug info options and ``-fprofile-sample-use``.
.. code-block:: console
- $ clang++ -O2 -gline-tables-only -fprofile-sample-use=code.prof code.cc -o code
+ $ clang++ -O2 -gline-tables-only \
+ -fdebug-info-for-profiling -funique-internal-linkage-names \
+ -fprofile-sample-use=code.prof code.cc -o code
[OPTIONAL] Sampling-based profiles can have inaccuracies or missing block/
edge counters. The profile inference algorithm (profi) can be used to infer
@@ -2545,6 +2561,7 @@ usual build cycle when using sample profilers for optimization:
.. code-block:: console
$ clang++ -O2 -gline-tables-only -fprofile-sample-use=code.prof \
+ -fdebug-info-for-profiling -funique-internal-linkage-names \
-fsample-profile-use-profi code.cc -o code
Sample Profile Formats
>From 2da2b487cfd3edf6e59ad0b765ed34364e9adf71 Mon Sep 17 00:00:00 2001
From: Tim Creech <timothy.m.creech at intel.com>
Date: Thu, 25 Apr 2024 22:27:34 -0400
Subject: [PATCH 3/4] fixup: always include both clang and clang-cl command
examples
---
clang/docs/UsersManual.rst | 36 +++++++++++++++++++++++++++---------
1 file changed, 27 insertions(+), 9 deletions(-)
diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst
index b87fc7f2aaa4dd..c57d6a95a9f728 100644
--- a/clang/docs/UsersManual.rst
+++ b/clang/docs/UsersManual.rst
@@ -2463,7 +2463,7 @@ usual build cycle when using sample profilers for optimization:
.. code-block:: winbatch
- $ clang-cl -O2 -gdwarf -gline-tables-only ^
+ > clang-cl -O2 -gdwarf -gline-tables-only ^
/clang:-fdebug-info-for-profiling /clang:-funique-internal-linkage-names ^
code.cc -o code -fuse-ld=lld -link -debug:dwarf
@@ -2547,22 +2547,40 @@ usual build cycle when using sample profilers for optimization:
used in the first step. The only requirement is that you build the code
with the same debug info options and ``-fprofile-sample-use``.
+ On Linux:
+
.. code-block:: console
$ clang++ -O2 -gline-tables-only \
-fdebug-info-for-profiling -funique-internal-linkage-names \
-fprofile-sample-use=code.prof code.cc -o code
- [OPTIONAL] Sampling-based profiles can have inaccuracies or missing block/
- edge counters. The profile inference algorithm (profi) can be used to infer
- missing blocks and edge counts, and improve the quality of profile data.
- Enable it with ``-fsample-profile-use-profi``.
+ On Windows:
- .. code-block:: console
+ .. code-block:: winbatch
+
+ > clang-cl -O2 -gdwarf -gline-tables-only ^
+ /clang:-fdebug-info-for-profiling /clang:-funique-internal-linkage-names ^
+ -fprofile-sample-use=code.prof code.cc -o code -fuse-ld=lld -link -debug:dwarf
+
+ [OPTIONAL] Sampling-based profiles can have inaccuracies or missing block/
+ edge counters. The profile inference algorithm (profi) can be used to infer
+ missing blocks and edge counts, and improve the quality of profile data.
+ Enable it with ``-fsample-profile-use-profi``. For example, on Linux:
+
+ .. code-block:: console
+
+ $ clang++ -fsample-profile-use-profi -O2 -gline-tables-only \
+ -fdebug-info-for-profiling -funique-internal-linkage-names \
+ -fprofile-sample-use=code.prof code.cc -o code
- $ clang++ -O2 -gline-tables-only -fprofile-sample-use=code.prof \
- -fdebug-info-for-profiling -funique-internal-linkage-names \
- -fsample-profile-use-profi code.cc -o code
+ On Windows:
+
+ .. code-block:: winbatch
+
+ > clang-cl /clang:-fsample-profile-use-profi -O2 -gdwarf -gline-tables-only ^
+ /clang:-fdebug-info-for-profiling /clang:-funique-internal-linkage-names ^
+ -fprofile-sample-use=code.prof code.cc -o code -fuse-ld=lld -link -debug:dwarf
Sample Profile Formats
""""""""""""""""""""""
>From 260f474f8eeee66122c5a220a652993413d444bf Mon Sep 17 00:00:00 2001
From: Tim Creech <timothy.m.creech at intel.com>
Date: Sat, 27 Apr 2024 17:09:23 -0400
Subject: [PATCH 4/4] fixup: use forward-slash options with clang-cl when
possible
---
clang/docs/UsersManual.rst | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst
index c57d6a95a9f728..5359804c7af8b5 100644
--- a/clang/docs/UsersManual.rst
+++ b/clang/docs/UsersManual.rst
@@ -2463,9 +2463,9 @@ usual build cycle when using sample profilers for optimization:
.. code-block:: winbatch
- > clang-cl -O2 -gdwarf -gline-tables-only ^
+ > clang-cl /O2 -gdwarf -gline-tables-only ^
/clang:-fdebug-info-for-profiling /clang:-funique-internal-linkage-names ^
- code.cc -o code -fuse-ld=lld -link -debug:dwarf
+ code.cc /Fe:code /fuse-ld=lld /link /debug:dwarf
.. note::
@@ -2559,9 +2559,9 @@ usual build cycle when using sample profilers for optimization:
.. code-block:: winbatch
- > clang-cl -O2 -gdwarf -gline-tables-only ^
+ > clang-cl /O2 -gdwarf -gline-tables-only ^
/clang:-fdebug-info-for-profiling /clang:-funique-internal-linkage-names ^
- -fprofile-sample-use=code.prof code.cc -o code -fuse-ld=lld -link -debug:dwarf
+ /fprofile-sample-use=code.prof code.cc /Fe:code /fuse-ld=lld /link /debug:dwarf
[OPTIONAL] Sampling-based profiles can have inaccuracies or missing block/
edge counters. The profile inference algorithm (profi) can be used to infer
@@ -2578,9 +2578,9 @@ usual build cycle when using sample profilers for optimization:
.. code-block:: winbatch
- > clang-cl /clang:-fsample-profile-use-profi -O2 -gdwarf -gline-tables-only ^
+ > clang-cl /clang:-fsample-profile-use-profi /O2 -gdwarf -gline-tables-only ^
/clang:-fdebug-info-for-profiling /clang:-funique-internal-linkage-names ^
- -fprofile-sample-use=code.prof code.cc -o code -fuse-ld=lld -link -debug:dwarf
+ /fprofile-sample-use=code.prof code.cc /Fe:code /fuse-ld=lld /link /debug:dwarf
Sample Profile Formats
""""""""""""""""""""""
More information about the cfe-commits
mailing list