[llvm] [polly] [SimpleLoopUnswitch] Don't use BlockFrequencyInfo to skip cold loops (PR #159522)

Luke Lau via llvm-commits llvm-commits at lists.llvm.org
Tue Sep 23 10:11:31 PDT 2025


lukel97 wrote:

@fhahn @teresajohnson I did some more experimenting and I think the usage of lossy BFI is a bigger problem than I had initially expected.

I changed SimpeLoopUnswitch to always invalidate BlockFrequencyAnalysis and BranchProbabilityAnalysis beforehand (not including other analyses), so that the BFI data is always fresh. On arm64-apple-darwin -O3 there's a large difference in the number of "cold" loops skipped, specifically a lot that previously were considered cold:

```
lhs = lossy BFI
rhs = correct BFI

Program                                       simple-loop-unswitch.NumColdSkipped                
                                              lhs                                 rhs     diff
MultiSourc...ch/consumer-lame/consumer-lame    283.00                              282.00   -0.4%
MultiSourc...sumer-typeset/consumer-typeset    571.00                              564.00   -1.2%
MultiSource/Applications/siod/siod             176.00                              173.00   -1.7%
MultiSourc...e/Applications/SIBsim4/SIBsim4     56.00                               55.00   -1.8%
MultiSourc.../Applications/JM/lencod/lencod   1247.00                             1220.00   -2.2%
MultiSourc...arks/DOE-ProxyApps-C/CoMD/CoMD     75.00                               73.00   -2.7%
MultiSourc...xyApps-C/Pathfinder/PathFinder     65.00                               63.00   -3.1%
MultiSourc...OE-ProxyApps-C/XSBench/XSBench     20.00                               19.00   -5.0%
MultiSource/Applications/d/make_dparser        153.00                              143.00   -6.5%
MultiSource/Benchmarks/nbench/nbench            82.00                               76.00   -7.3%
MultiSource/Applications/oggenc/oggenc         472.00                              434.00   -8.1%
MultiSourc...hmarks/MallocBench/cfrac/cfrac     23.00                               21.00   -8.7%
MultiSourc...Benchmarks/SciMark2-C/scimark2     32.00                               29.00   -9.4%
MultiSourc...hmarks/ASC_Sequoia/AMGmk/AMGmk     89.00                               80.00  -10.1%
MultiSourc...rsaBench/beamformer/beamformer     19.00                               17.00  -10.5%
MultiSourc...e/Applications/obsequi/Obsequi     70.00                               62.00  -11.4%
MultiSource/Benchmarks/McCat/18-imp/imp         30.00                               26.00  -13.3%
MultiSourc...e-bitcount/automotive-bitcount      7.00                                6.00  -14.3%
MultiSource/Benchmarks/sim/sim                   7.00                                6.00  -14.3%
MultiSourc...SC_Sequoia/CrystalMk/CrystalMk     12.00                               10.00  -16.7%
MultiSource/Applications/sgefa/sgefa            48.00                               40.00  -16.7%
MultiSourc...OE-ProxyApps-C/RSBench/rsbench      6.00                                5.00  -16.7%
MultiSourc...Benchmarks/Ptrdist/yacr2/yacr2     35.00                               28.00  -20.0%
MultiSourc...marks/Prolangs-C/bison/mybison     75.00                               60.00  -20.0%
MultiSource/Benchmarks/Olden/mst/mst            15.00                               11.00  -26.7%
MultiSourc.../Rodinia/pathfinder/pathfinder     11.00                                8.00  -27.3%
MultiSource/Benchmarks/Olden/em3d/em3d          20.00                               14.00  -30.0%
MultiSourc.../Benchmarks/VersaBench/bmm/bmm     26.00                               18.00  -30.8%
MultiSourc...hmarks/FreeBench/neural/neural     28.00                               19.00  -32.1%
MultiSourc...enchmarks/VersaBench/dbms/dbms      6.00                                4.00  -33.3%
MultiSourc.../telecomm-CRC32/telecomm-CRC32      3.00                                2.00  -33.3%
MultiSourc...chmarks/VersaBench/8b10b/8b10b      5.00                                3.00  -40.0%
MultiSource/Benchmarks/McCat/09-vor/vor          2.00                                1.00  -50.0%
MultiSourc...chmarks/McCat/04-bisect/bisect      6.00                                3.00  -50.0%
MultiSource/Benchmarks/Ptrdist/ks/ks             8.00                                4.00  -50.0%
MultiSourc...Benchmarks/Olden/health/health      2.00                                1.00  -50.0%
MultiSourc...marks/Trimaran/enc-rc4/enc-rc4     16.00                                8.00  -50.0%
MultiSourc...rks/Trimaran/enc-3des/enc-3des     12.00                                3.00  -75.0%
MultiSourc...rks/BitBench/uuencode/uuencode      5.00                                1.00  -80.0%
MultiSourc...arks/FreeBench/distray/distray      2.00                                0.00 -100.0%
MultiSourc...reeBench/pcompress2/pcompress2      1.00                                0.00 -100.0%
```

This translates to a significant number of loops that we are erroneously skipping because we mistakenly think they are cold, **4.3% geomean overall**.

```
Metric: simple-loop-unswitch.NumBranches

Program                                       simple-loop-unswitch.NumBranches              
                                              lhs                              rhs    diff  
Applications/aha/aha                            1.00                             2.00 100.0%
Benchmarks/McCat/09-vor/vor                     1.00                             2.00 100.0%
Benchmarks/VersaBench/bmm/bmm                   6.00                            11.00  83.3%
Benchmarks/ASC_Sequoia/IRSmk/IRSmk              2.00                             3.00  50.0%
Benchmarks/nbench/nbench                        4.00                             6.00  50.0%
Benchmarks/ASC_Sequoia/AMGmk/AMGmk             11.00                            16.00  45.5%
Benchmarks...sumer-typeset/consumer-typeset    10.00                            14.00  40.0%
Benchmarks...iabench/g721/g721encode/encode     3.00                             4.00  33.3%
Benchmarks.../Rodinia/pathfinder/pathfinder     3.00                             4.00  33.3%
Benchmarks/DOE-ProxyApps-C/CoMD/CoMD           13.00                            17.00  30.8%
Benchmarks/McCat/18-imp/imp                    13.00                            17.00  30.8%
Benchmarks/FreeBench/neural/neural             10.00                            13.00  30.0%
Benchmarks...nch/mpeg2/mpeg2dec/mpeg2decode     4.00                             5.00  25.0%
Benchmarks/Trimaran/enc-3des/enc-3des           4.00                             5.00  25.0%
Benchmarks.../DOE-ProxyApps-C++/HPCCG/HPCCG     4.00                             5.00  25.0%
                           Geomean difference                                           4.3%
```

For this reason alone I think we should remove the use of BlockFrequencyInfo as it's very likely that it's impeding performance. In the above results, the increase in loops unswitched all come from hot loops.

https://github.com/llvm/llvm-project/pull/159522


More information about the llvm-commits mailing list