[llvm] [VPlan] Add the cost of spills when considering register pressure (PR #179646)

Wed Feb 4 04:31:26 PST 2026

john-brawn-arm wrote:

The motivation for doing this is that I'm looking at enabling shouldConsiderVectorizationRegPressure on Arm Cortex-M CPUs with MVE, and the current behaviour makes things significantly worse in some cases due to preventing vectorization when it's beneficial. I've been specifically looking at the code we generate for https://github.com/ARM-software/CMSIS-DSP on Cortex-M55. If I enable vectorization register pressure then with the current behaviour the change in throughput is
| Function | Change |
| -------- | ------ |
| Filtering/arm_conv_partial_q31 | 3.64% |
| Filtering/arm_conv_q31 | 1.37% |
| Filtering/arm_correlate_q31 | -1.15% |
| Filtering/arm_fir_decimate_f32 | -0.92% |
| Filtering/arm_fir_decimate_q31 | -56.92% |
| Filtering/arm_fir_f32_16taps | 187.17% |
| Filtering/arm_fir_f32_4taps | 34.33% |
| Filtering/arm_fir_f32_8taps | 350.95% |
| Filtering/arm_fir_q31_16taps | -4.88% |
| Filtering/arm_fir_q31_4taps | 0.54% |
| Filtering/arm_fir_q31_8taps | -9.04% |
| Matrix/arm_mat_vec_mult_f16 | -52.49% |
| Matrix/arm_mat_vec_mult_f32 | -48.69% |
| Quaternion/arm_quaternion2rotation_f32 | -4.66% |
| Transform/arm_cfft_f16 | -7.94% |
| Transform/arm_rfft_fast_f16 | -0.39% |
| Transform/arm_rfft_fast_f32 | -8.86% |

With this PR the change in throughput is
| Function | Change |
| -------- | ------ |
| Filtering/arm_fir_f32_16taps | 187.17% |
| Filtering/arm_fir_f32_4taps | 34.33% |
| Filtering/arm_fir_f32_8taps | 350.95% |
| Filtering/arm_fir_q31_16taps | -4.88% |
| Filtering/arm_fir_q31_4taps | 0.54% |
| Filtering/arm_fir_q31_8taps | -9.04% |
| Quaternion/arm_quaternion2rotation_f32 | -4.66% |

The remaining regressions are due to the relative costs of interleave vs gather/scatter vs scalarize being wrong in some cases, which I'll be looking at next.

I've also checked llvm-test-suite on Neoverse-V2 (AWS Graviton 4), where useMaxBandwidth is enabled for scalable vectors and so register pressure calculation is used, and there's zero change in code generation.

https://github.com/llvm/llvm-project/pull/179646