[llvm] [AArch64] Enable cmp + csel fusion for Neoverse V2 (PR #94309)
Mingming Liu via llvm-commits
llvm-commits at lists.llvm.org
Wed Jul 24 23:06:53 PDT 2024
minglotus-6 wrote:
Overdue update:
1. Perf testing for this change gives small (<0.5%) ups and downs over multiple benchmarks, and looks mostly noise since run-to-run variance is high (with or without [random interleaved](https://github.com/google/benchmark/blob/main/docs/random_interleaving.md) the benchmark instances). Repeating the test flow for `cmp + bcc` pattern (enabled in https://github.com/llvm/llvm-project/pull/90608) does give ~1% ~ 2% improvements for [compression](https://github.com/llvm/llvm-project/pull/90608) and pretty significant (~up to 17%) wins for libc workload.
2. I didn't have bandwidth to track down how often compiler fusion kicked in. Luckily another colleague studied op fusion patterns using DynamoRIO memtraces. The traces are from real jobs or spec, and the study shows `cmp+bcc` pattern more common than `cmp + csel` pattern.
I'll close this PR for now.
https://github.com/llvm/llvm-project/pull/94309
More information about the llvm-commits
mailing list