<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/56274>56274</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            BOLT gives lower improvement on clang-bootstrap than before
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            BOLT
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          rlavaee
      </td>
    </tr>
</table>

<pre>
    My recent experience with LLVM trunk shows a smaller improvement on clang than my prior experience with the incubator repo (https://github.com/facebookincubator/BOLT). 

Here is the log for perf2bolt and llvm-bolt:

```
> perf2bolt -o pgo-labels.fdata -w pgo-labels-compiler.yaml -p pgo-labels.perfdata clang-15                                                                                                                                                             
BOLT-INFO: shared object or position-independent executable detected                                                                                                                                                                                                                                                                                                                                                  
PERF2BOLT: Starting data aggregation job for pgo-labels.perfdata                                                                                                                                                                                                                                                                                                                                                      
PERF2BOLT: spawning perf job to read branch events                                                                                                                                                                                                                                                                                                                                                                    
PERF2BOLT: spawning perf job to read mem events                                                                                                                                                                                                                                                                                                                                                                       
PERF2BOLT: spawning perf job to read process events                                                                                                                                                                                                                                                                                                                                                                   
PERF2BOLT: spawning perf job to read task events                                                                                                                                                                                                                                                                                                                                                                      
BOLT-INFO: Target architecture: x86_64                                                                                                                                                                                                                                                                                                                                                                                
BOLT-INFO: BOLT version: 3f028c02ba6a24b7230fd5907a2b7ba076664a8b                                                                                                                                                                                                                                                                                                                                                     
BOLT-INFO: first alloc address is 0x0                                                                                                                                                                                                                                                                                                                                                                                 
BOLT-INFO: creating new program header table at address 0x5400000, offset 0x5400000                                                                                                                                                                                                                                                                                                                                   
BOLT-INFO: enabling relocation mode                                                                                                                                                                                                                                                                                                                                                                                   
BOLT-INFO: enabling strict relocation mode for aggregation purposes                                                                                                                                                                                                                                                                                                                                                   
BOLT-WARNING: Failed to analyze 2529 relocations                                                                                                                                                                                                                                                                                                                                                                      
BOLT-INFO: pre-processing profile using perf data aggregator                                                                                                                                                                                                                                                                                                                                                          
BOLT-WARNING: build-id will not be checked because we could not read one from input binary                                                                                                                                                                                                                                                                                                                            
PERF2BOLT: waiting for perf mmap events collection to finish...                                                                                                                                                                                                                                                                                                                                                       
PERF2BOLT: parsing perf-script mmap events output                                                                                                                                                                                                                                                                                                                                                                     
PERF2BOLT: waiting for perf task events collection to finish...                                                                                                                                                                                                                                                                                                                                                       
PERF2BOLT: parsing perf-script task events output                                                                                                                                                                                                                                                                                                                                                                     
PERF2BOLT: input binary is associated with 100 PID(s)                                                                                                                                                                                                                                                                                                                                                                 
PERF2BOLT: waiting for perf events collection to finish...                                                                                                                                                                                                                                                                                                                                                            
PERF2BOLT: parse branch events...                                                                                                      
PERF2BOLT: read 492075 samples and 15682980 LBR entries                                                                                
PERF2BOLT: 216 samples (0.0%) were ignored                                                                                             
PERF2BOLT: traces mismatching disassembled function contents: 5324 (0.0%)                                                              
PERF2BOLT: out of range traces involving unknown regions: 1618631 (10.7%)                                                              
BOLT-INFO: forcing -jump-tables=move as PIC jump table was detected in function _ZN4llvm10BasicBlock28replaceSuccessorsPhiUsesWithEPS0_S1_                                                                                                                                                             
BOLT-WARNING: 4 collisions detected while hashing binary objects. Use -v=1 to see the list.                                                                                                                                                                                    
PERF2BOLT: processing branch events..
```

```
> llvm-bolt clang-15 -o pgo-relocs/build/bin/clang-15-bolt -b pgo-relocs-compiler.yaml -reorder-blocks=ext-tsp -reorder-functions=hfsort+ -split-functions -split-all-cold -dyno-stats -icf=1 -use-gnu-stack -inline-small-functions -simplify-rodata-loads -plt=hot

BOLT-INFO: shared object or position-independent executable detected
BOLT-INFO: Target architecture: x86_64
BOLT-INFO: BOLT version: 3f028c02ba6a24b7230fd5907a2b7ba076664a8b
BOLT-INFO: first alloc address is 0x0
BOLT-INFO: enabling relocation mode
BOLT-INFO: enabling lite mode
BOLT-WARNING: Failed to analyze 2529 relocations
BOLT-INFO: pre-processing profile using YAML profile reader
BOLT-INFO: forcing -jump-tables=move as PIC jump table was detected in function _ZN4llvm10BasicBlock28replaceSuccessorsPhiUsesWithEPS0_S1_
BOLT-INFO: 6042 out of 136908 functions in the binary (4.4%) have non-empty execution profile
BOLT-INFO: 347 functions with profile could not be optimized
BOLT-INFO: the input contains 4354 (dynamic count : 268784) opportunities for macro-fusion optimization. Will fix instances on a hot path.
BOLT-INFO: 371417 instructions were shortened
BOLT-INFO: removed 344 empty blocks
BOLT-INFO: ICF folded 413 out of 137214 functions in 3 passes. 0 functions had jump tables.
BOLT-INFO: Removing all identical functions will save 59.75 KB of code space. Folded functions were called 113460 times based on profile.
BOLT-INFO: simplified 102 out of 3594 loads from a statically computed address.
BOLT-INFO: dynamic loads simplified: 4317
BOLT-INFO: dynamic loads found: 61577
BOLT-INFO: inlined 1227 calls at 18 call sites in 2 iteration(s). Change in binary size: 4 bytes.
BOLT-INFO: 4879 PLT calls in the binary were optimized.
BOLT-INFO: basic block reordering modified layout of 3729 (2.73%) functions
BOLT-INFO: UCE removed 1 blocks and 7 bytes of code.
BOLT-INFO: splitting separates 3226174 hot bytes from 7737417 cold bytes (29.43% of split functions is hot).
BOLT-INFO: 106 Functions were reordered by LoopInversionPass
BOLT-INFO: hfsort+ reduced the number of chains from 5975 to 650
BOLT-INFO: program-wide dynostats after all optimizations before SCTC and FOP:

            17279782 : executed forward branches
             1942886 : taken forward branches
             2900344 : executed backward branches
             1779625 : taken backward branches
              855760 : executed unconditional branches
             1686232 : all function calls
              571541 : indirect calls
              243850 : PLT calls
           163314338 : executed instructions
            38492046 : executed load instructions
            20762991 : executed store instructions
              224132 : taken jump table branches
                   0 : taken unknown indirect branches
            21035886 : total branches
             4578271 : taken branches
            16457615 : non-taken conditional branches
             3722511 : taken conditional branches
            20180126 : all conditional branches

            16810312 : executed forward branches (-2.7%)
              824937 : taken forward branches (-57.5%)
             3369814 : executed backward branches (+16.2%)
             1647148 : taken backward branches (-7.4%)
              599903 : executed unconditional branches (-29.9%)
             1441570 : all function calls (-14.5%)
              571541 : indirect calls (=)
                   0 : PLT calls (-100.0%)
           162404688 : executed instructions (-0.6%)
            38488076 : executed load instructions (-0.0%)
            20762991 : executed store instructions (=)
              224132 : taken jump table branches (=)
                   0 : taken unknown indirect branches (=)
            20780029 : total branches (-1.2%)
             3071988 : taken branches (-32.9%)
            17708041 : non-taken conditional branches (+7.6%)
             2472085 : taken conditional branches (-33.6%)
            20180126 : all conditional branches (=)

BOLT-INFO: SCTC: patched 8 tail calls (8 forward) tail calls (0 backward) from a total of 8 while removing 0 double jumps and removing 8 basic blocks totalling 40 bytes of code. CTCs total execution count is 1207 and the number of times CTCs are taken is 1164.
BOLT-INFO: setting __hot_start to 0x5400000
BOLT-INFO: setting __hot_end to 0x59d53e5
```

I am measuring 5.5% improvement on top of PGO binary (compared to around 9-10% I was seeing before):
```
pgo-labels-bolt-compiler -> average(507.406)
pgo-labels-compiler -> average(537.33)
Metric: time
Group 1 mean = 537.330005 ± 1.036598
Group 2 mean = 507.406000 ± 3.630159
P value      = 2.01e-05
Diff mean (95% CI)  = -29.9240 ± 3.5663
Percent   (95% CI) = -5.5690% (± 0.6637%)
```
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJztW1tT47gS_jXhRWWXLN8feBhg2EOduVAzs2frnBdKtpVEi225JJsM--u3W3YSB5wAdRjY3UFFAbbVUqsvX7facqaK2-OPt0SLXNQtEd8boaWoc0FWsl2SDx_-85G0uquviVmqlSGcmIqXpdBEVo1WN6JCMlWTvOT1grRLXpPqljRaKn1vtHYpiKzzLuMtPNWiUWTGkmXbNmbmv5uxc_hZQMcuc3NVwcWc5yJT6npDBPdOPn_4NmOpS2b0bEbf9b__JTQMbewMpVqQOYwPc89ZpsqW8LogZXlTOXiFM40oZxEdfvpL__2I0FGkWSin5JkojTsveMuJsxrdc4DPRoI43FtelcRpxv1xHEtiZeN4Ifkrt379KF3n4tP5ZxATqJxrURCV_S5yUDKIVBnZSlU7si5EI-CXtRmRdy3PSkEK0UJPIHlrz9x67Vy-_3LOrAOAdr62XLcSnM7aGF8stFhw1A75XWW9B0zY4lv7AW1CO6bhqxq1g5K3GmkVQB4vSKZ5nS-JuAHnMa_N-c_QnqSdSlRvqnnJ9iTtQMqRC2PeNPRS7Unaabm5flPNC7aJnO0b1wsBKa_OlxKTsU4LvP09ia6i4LX5_dnahH7wgtwIbSBRw2t_TlmSU5bxiLMgi5lP50WY0pizLM44jaMoCniSvfZS_pFtQj9zqQ24T1mqnPCi0BhtYGNJv9PXZvanaxPaySHO2D1PLVaYDCw0r8gSYo_QpN-C8najNvo9DCi2GTslaj43gIube6-9uH9Am9CPqEEJqB8twIH63WilCvHarP6E7ZB2TKtl3t5TEtYMxnWEptONMuItmXv-NtLOb---fLr49Asq6JzLUhSYT_Oal7d_CMJClo709KaKF2kTvtNo4Qy7T7v30WoOuiKd2eyEdspw4Elv7Qe1ad_JOlkWjizISpYlqVVLMkHypcivwaMykfPOCLKCW6orC_vc7llVDbinVUVk3XRAI2uub197hX_nNlExWHFpk7b1WxlSVbxZFwpyVZawT8V4A7g3l7U0S9d1X3sZ_9A2oZ2G6w2IOSbXsml3FKS6Fj3jrf349hjfGRfZ3nznJdsjfWesoDffebE2oZ2dkC4N4caoXHJ8QW3PI3iUksuLsxlLzAzS7Lf2Y9tj0O0N2F6r7UE3sfu--sWUMMGOzdeDlNE4JIZXTSmMPdrjhVHC0oSSDydfCDCp5fOXLCbYYV60YQMghLp0xkLEkZU9jLSolf7BJ2EmmGo1hy0qqaSpeJsv7fEQaQD4RJVhaWHe1b1b5apuUaFIFPos2F3Cc7MFcZCoOQFLWog1j7K-UeUNctjV17Va1aDgBdY4kMCLvCTyPeTKo278fGztFvuVzpEB5_euahxbQIbZzyp1IyBYQGw4JfhkKC2v4NbmhJOst7K8-t-nAM-XefSEG5mflCq_ZokWTQkL_drlWDRQ2lwu5a9GmN8g9Ly__EqvvnpX_9-KfnCb3mcHFpylsdWojThWS6yGLLmxJjeE3P7gmHEJLJs4NyBZD-HcCNEf1JOm_VtC-hRUbktDd_By-pjhvrOHm2OK20ODwyFEWwSEPOXcFjrwr6zh97pbT-Rko753DydqoXQhtJOhfaKdi--t05pm-2Rt0fhwOQebbWfshDimKWW7fbi-wcsSpigL4hS3tXJMyyFyOzKfW0U7nRHOou7wfn4N9-tS1sKxx0d3xpIAoXJ-62iFFTSnVLyA2w2e1DxbqnYssOc5nfjUt-bP-Bb3SS8cn_B-5VBXUJW42-lJNeenVkP_--7jh80tbV_O_RWxd4KniAZsHas8P0ppQraWCrMiag3YBoEpcIMhLi05MF2D5YmqaW8Hm7NvUHopTEzlB_FobLsdWYtsW6HMBFFNKyv5x6TV9uepcY-D0ZxLGCjwQxvKwSN5JXMcC5zApitREicBcquaBjy7q8FZIAxj6l3xXCtwSjTn9YxW9y75Daupc_kdJgJHrjFwQx9OwDMhP2yX7tTaYi_wYkuhu_UKMSkyS5hY1JOL0QJVX4BgAtKLccCp-10vTs-B7bKA3oHnbxUWMy_YVZgPPELuAyGIjh4sIY_c2paZWsIX5AbNExyTSEQSmfNyR2PwwKDew9SFhPTfJ8hDjq_RTAOm55LznsMRCYogx9PzkLN6fhBRApIGgWbcIJBtzGWKoQEmJdLSjZH6YRqQHjFtEZsTRGFktbwlCP8d-ssAK1PDru2kH2M7iY30vhc_SDIHA7O9Iy-Mp7r3uA9cMxbb1Rt8R-8l9n-YsbWpIGEE_tPW6Ib9uEtOlzZfhKeD0xlwhD4HyW7bacUFSZySS0Dnfqpdp7Ua2HjUFHmGINJbHhmiIloBwGcv-5LfrkUfgysBq8yN_QEGtuHz_sC_nr7fmLg3mLbdvcT9WtbWM6l7DLd2n2wE7Mk4dvcZi7w4sH7YD2ANII79GH3PxuX-PvKYugEyiZPYwcZeYnAMlPfEzB6NyPmuAQ9SwZcqt-SDUs1FPQTCS3C1iTG2iQQQdTnGGtBI3VWZ0HbVS4tclv0wBVeCUBSFU9FvOPDhrMAh0QxVn3LwOViOddQxdIFbCcA2Qb6efju1kj7_fHnn449xUufFLE7jhFmw7BEcvVfpFdfrQ9vC3KcjXhqwJIksXcuvRf0YIpZSiki3M1kGidKDs8VxGoEmt7M9hookYRgD3uxMB3pVdWFTJoC2Q3NGScT8XjIo5u0mEn1sarow9sLAI73_F1Jjhra3Mwv8JOx52zjuvX5e5Pte4PvJ7hrGMeb-2H6C5YIg2qVB4HqAkEHGxtLU2yU0LdrTYUqgZRCU2EhBozTmkIpsoyO69bZ4I8H91MyjfrgxQtUe1mcQgp3H3tiG9nb2IugN2G47Y4rTEzzScgAkWeiNZ3ocIaNeQj0WbUxuP9kUywmIw3vAkxEXHbYuLUz6DAtSPz7g13aIMHbDvWP4kEQm3kNujuMAPnqRy_aOBHqArCo55PeWnXidlE66ZZqm1H8MCvTSSd10P0NBABGf7gEFS-8FB0SzHyOsOPyzPWS23UGLfja6qV9NoAcLAAeSA-hhx6ButI9hwJIkAVh4AEuGYfYx8mhkOSyEx2DMo8X4ANocGgdWk1CKidA93Ol1csCifRp7aZJMoJAl9dl-44MgSBM6GM9hUBp8K96vWIg_MaNJ-ABI9Uz5-8d5FGbdFeZEloP5Sl99b4GiIAkwJcutoSdrHMKUc_cR3YCCTUf7HUGvFki0kqFKp9d7G0oK1aHJoPX02ejmWTJOhU0_iC0nBPROwkqA36HDaOfbbzwhufTARuzQu1lfv_OxpBxMv5c7dgegm8yBRZ8BX11Btnpl8NtHTBS3Z3wfIhF1MRCkReiL8EBd7oLwilSCm84m_6FFsbsfPLeqwXVc_vJ5VBLALZetSmE5RePWiKQATEh-YWsYRghbI7SpKVqBP10LHH1ijIW9TSmPOFglhG2n5gugh7wJAJ9GG3ua-DT5Hokfu76_ofgo8DCotX1QSX_vF2C9gW0KyKAGcz4jPQ2IGbzklM1OPOK51I_CNBkTsBFBzxcetB4IwHN86oXDrJfkhpfdcC4YCZhLPeHQQS1ncj4fBmNJasV_emHfAGBfG5gA0LdDh1HkDwMLbb9kJ3cpLSGoMkqtOtAPe2rA_MjfSQM2ujgqjv0i9VN-1Mq2FMe27LeQN2C5pVrt-QgeFKZaAHLe9J_D97o-6nR5fOBTd6xhDX-wsoY1TbiUxnQCi75hxOLgaHkcsWIeBVGU8nkeB3EeBSzIc0gy4iL1vIAe9co_noUnM9ZXpxmbhWdH8phRxmgEm0E_hFTazYNQRGlYhCwtRBGns4CKCtDERRZcpRdH-rivRncLAw-xWm-2D2GvJxe1EHYmGJ937VLpY13yGy7EkWX82HL9J0TsNgQ">