[llvm] [AArch64] New pass for code layout optimizations. (PR #184434)

Sun Apr 5 12:53:49 PDT 2026

ayasin-a wrote:

Okay - Here is designated data demonstrating those instruction pairs are indeed behind the achieved performance stabilization. This is measured on 5 internal workloads on a recent Apple M-series processor.

| Workload | perf-drift | Baseline: FCMP-FCSEL Hits | Baseline: FCMP-FCSEL Straddle | Baseline: CMP-CSEL Hits | Baseline: CMP-CSEL Straddle | Baseline: Noise | Baseline: CV | code-layout-opt: FCMP-FCSEL Hits | code-layout-opt: FCMP-FCSEL Straddle | code-layout-opt: CMP-CSEL Hits | code-layout-opt: CMP-CSEL Straddle | code-layout-opt: Noise | code-layout-opt: CV |
|----------|-------:|-------:|-------:|-------:|-------:|-------:|-------:|-------:|-------:|-------:|-------:|-------:|-------:|
| Workload A | 0.3% | 1.204% | 0.080% | 0.547% | 0.007% | 2.43% | 0.81% | 1.204% | 0.026% | 0.547% | 0.179% | 1.08% | 0.26% |
| Workload B | 0.2% | 0.000% | 0.000% | 0.087% | 0.005% | 0.46% | 0.10% | 0.000% | 0.000% | 0.087% | 0.003% | 0.59% | 0.10% |
| Workload C | -0.1% | 0.654% | 0.000% | 0.099% | 0.000% | 1.93% | 0.46% | 0.653% | 0.025% | 0.099% | 0.000% | 0.56% | 0.14% |
| Workload D | 0.1% | 0.005% | 0.001% | 0.127% | 0.016% | 2.02% | 0.34% | 0.005% | 0.000% | 0.127% | 0.001% | 0.57% | 0.12% |
| Workload E | -0.1% | 0.000% | 0.000% | 0.113% | 0.000% | 0.78% | 0.14% | 0.000% | 0.000% | 0.113% | 0.040% | 0.77% | 0.17% |

The Hits and Straddle metrics are percentages of that pair over dynamic instruction counts (workload-only). Straddle % varies as it is highly sensitive to the overall generated code, the particular compiler commit this PR is merged to, system & libraries the binary was built with and such things that could impact the final binary layout. Hence Hits is also included which is inherent to the workload itself.

Other metrics (including perf-drift) are averages, when using many variants methodology I mentioned in https://github.com/llvm/llvm-project/pull/184434#issuecomment-4129433914. In particular 30 variants of the binary were used, and each variant was repeated multiple times to filter out temporal system noise. The `perf-drift` and `CV` (Coefficient of Variation) metrics are averages-of-averages in this experiment. While the `Noise` metric is calculated as percentage of `max / min - 1.0` across repeat-average of those variants. `Noise` and `CV` are improved for most workloads which suggests better performance stability.

https://github.com/llvm/llvm-project/pull/184434