[llvm] [BOLT] Skip the perf2bolt step on AArch64 (PR #112070)

Wenlong Mu via llvm-commits llvm-commits at lists.llvm.org
Fri Oct 11 20:13:34 PDT 2024


https://github.com/onroadmuwl created https://github.com/llvm/llvm-project/pull/112070

To reduce the optimization time of BOLT on AArch64, I attempt to use the ` -p perf.data -nl `option in `llvm-bolt` directly. However, the output indicates that the target binary isn’t optimized by BOLT on AArch64 platform, as seen below:

>                    0 : executed forward branches
>                    0 : taken forward branches
>                    0 : executed backward branches
>                    0 : taken backward branches
>                    0 : executed unconditional branches
>                    0 : all function calls
>                    0 : indirect calls
>                    0 : PLT calls
>                    0 : executed instructions
>                    0 : executed load instructions
>                    0 : executed store instructions
>                    0 : taken jump table branches
>                    0 : taken unknown indirect branches
>                    0 : total branches
>                    0 : taken branches
>                    0 : non-taken conditional branches
>                    0 : taken conditional branches
>                    0 : all conditional branches
>                    0 : linker-inserted veneer calls
By further analyzing the reason, I resolve the issue by associating  `BinaryFunction` with  `SampleData`. Additionally, to prevent incorrect mapping of samples to basic blocks, I ensured that the samples are sorted before being processed.
With these changes, the output is now consistent with that obtained using the ` -b perf.fdata` option, ensuring successful optimization.

>             77756297 : executed forward branches
>             45234387 : taken forward branches
>             21542072 : executed backward branches
>              9313428 : taken backward branches
>              9655158 : executed unconditional branches
>             24234134 : all function calls
>              7322769 : indirect calls
>              1489067 : PLT calls
>            668165930 : executed instructions
>            157103847 : executed load instructions
>                    0 : executed store instructions
>                    0 : taken jump table branches
>                    0 : taken unknown indirect branches
>            108953527 : total branches
>             64202973 : taken branches
>             44750554 : non-taken conditional branches
>             54547815 : taken conditional branches
>             99298369 : all conditional branches
>                    0 : linker-inserted veneer calls
> 
>             90228252 : executed forward branches (+16.0%)
>              3531918 : taken forward branches (-92.2%)
>              9070117 : executed backward branches (-57.9%)
>              3417336 : taken backward branches (-63.3%)
>              2902049 : executed unconditional branches (-69.9%)
>             24234134 : all function calls (=)
>              7322769 : indirect calls (=)
>              1489067 : PLT calls (=)
>            662605907 : executed instructions (-0.8%)
>            157103847 : executed load instructions (=)
>                    0 : executed store instructions (=)
>                    0 : taken jump table branches (=)
>                    0 : taken unknown indirect branches (=)
>            102200418 : total branches (-6.2%)
>              9851303 : taken branches (-84.7%)
>             92349115 : non-taken conditional branches (+106.4%)
>              6949254 : taken conditional branches (-87.3%)
>             99298369 : all conditional branches (=)
>                    0 : linker-inserted veneer calls (=)
Thank you for considering this PR. I look forward to any feedback you may have.


>From 79b82acea34fc4112ebad8d274b4f7153b42a83f Mon Sep 17 00:00:00 2001
From: Wenlong Mu <muwl182 at 163.com>
Date: Sat, 12 Oct 2024 11:02:11 +0800
Subject: [PATCH 1/2] [BOLT] Sort FuncSamples in processProfile

---
 bolt/lib/Profile/DataAggregator.cpp | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/bolt/lib/Profile/DataAggregator.cpp b/bolt/lib/Profile/DataAggregator.cpp
index 0a63148379d900..0426808b1c2dce 100644
--- a/bolt/lib/Profile/DataAggregator.cpp
+++ b/bolt/lib/Profile/DataAggregator.cpp
@@ -645,6 +645,9 @@ void DataAggregator::processProfile(BinaryContext &BC) {
   for (auto &FuncBranches : NamesToBranches)
     llvm::stable_sort(FuncBranches.second.Data);
 
+  for (auto &FuncSamples : NamesToSamples)
+    llvm::stable_sort(FuncSamples.second.Data);
+
   for (auto &MemEvents : NamesToMemEvents)
     llvm::stable_sort(MemEvents.second.Data);
 

>From b976d6228a72f444ca1195b5babae32b9037bb7e Mon Sep 17 00:00:00 2001
From: Wenlong Mu <muwl182 at 163.com>
Date: Sat, 12 Oct 2024 11:05:27 +0800
Subject: [PATCH 2/2] [BOLT] Associate BinaryFunction with SampleData

---
 bolt/lib/Profile/DataAggregator.cpp | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/bolt/lib/Profile/DataAggregator.cpp b/bolt/lib/Profile/DataAggregator.cpp
index 0426808b1c2dce..3bfa65c824c8a5 100644
--- a/bolt/lib/Profile/DataAggregator.cpp
+++ b/bolt/lib/Profile/DataAggregator.cpp
@@ -599,6 +599,11 @@ Error DataAggregator::readProfile(BinaryContext &BC) {
     convertBranchData(Function);
   }
 
+  for (auto &BFI : BC.getBinaryFunctions()) {
+    BinaryFunction &BF = BFI.second;
+    readSampleData(BF);
+  }
+
   if (opts::AggregateOnly) {
     if (opts::ProfileFormat == opts::ProfileFormatKind::PF_Fdata)
       if (std::error_code EC = writeAggregatedFile(opts::OutputFilename))



More information about the llvm-commits mailing list