[libc-commits] [libc] [libc] Fix buggy AVX2 `memcmp` (PR #77081)
Guillaume Chatelet via libc-commits
libc-commits at lists.llvm.org
Tue Jan 9 07:59:37 PST 2024
gchatelet wrote:
I've run a couple of benchmarks. All numbers are throughput in GB/s : higher is better. They are run on a sapphirerapids machine.
- `pre-nafi base` is revision aa28875a745d4d70f491b59e1b50f592994c923b, just before the buggy version. It represents the previous implementation.
- `buggy` is 124efcaa973306ce42633cea07ed3cf55d63afde.
- `GPR` is this patch at commit [2ca4747](https://github.com/llvm/llvm-project/pull/77081/commits/2ca47472277a349f3e28834cc3918565fc00ccea).
- `vector` is this patch at commit [2ca4747](https://github.com/llvm/llvm-project/pull/77081/commits/2ca47472277a349f3e28834cc3918565fc00ccea) with the `avx512vbmi` version disabled, that is [these lines removed](https://github.com/llvm/llvm-project/pull/77081/files#diff-115856bf76e3fda2c70579070453144469415cad393afe916b3e8c41f03a5db3R241-R255).
| | AVX2 | | | AVX512 | | | |
|---------------|:-------------:|:-------:|:-------:|:-------------:|:-------:|:-------:|:-------:|
| | pre-nafi base | buggy | vector | pre-nafi base | buggy | GPR | vector |
| BM_Memcmp/0/0 | 2.91849 | 3.11338 | 3.1697 | 2.96577 | 3.57009 | 3.11467 | 3.5289 |
| BM_Memcmp/1/0 | 8.54724 | 8.72347 | 8.91954 | 8.26899 | 9.91166 | 9.12617 | 10.4736 |
| BM_Memcmp/2/0 | 5.57227 | 5.88153 | 5.86237 | 5.63377 | 6.52597 | 5.84524 | 6.46245 |
| BM_Memcmp/3/0 | 4.65409 | 5.31556 | 5.24833 | 4.93053 | 5.87956 | 5.23889 | 5.92499 |
| BM_Memcmp/4/0 | 2.41787 | 2.52415 | 2.50199 | 2.34756 | 2.69541 | 2.62342 | 2.61858 |
| BM_Memcmp/5/0 | 4.95331 | 5.26778 | 5.52597 | 4.73212 | 5.80927 | 5.84406 | 5.89062 |
| BM_Memcmp/6/0 | 8.26034 | 9.09897 | 8.7936 | 8.34453 | 9.72203 | 9.48509 | 9.68149 |
| BM_Memcmp/7/0 | 5.76333 | 5.97761 | 5.91712 | 5.50937 | 6.40736 | 6.36177 | 6.57352 |
| BM_Memcmp/8/0 | 3.12097 | 3.301 | 3.24115 | 3.1782 | 3.6211 | 3.358 | 3.49582 |
| BM_Memcmp/9/0 | 40.1616 | 39.7599 | 39.6047 | 62.2187 | 64.8115 | 62.472 | 63.1284 |
Same table as percentage of base
| | AVX2 | | | AVX512 | | | |
|---------------|:-------------:|:-----:|:------:|:-------------:|:-----:|:----:|:------:|
| | pre-nafi base | buggy | vector | pre-nafi base | buggy | GPR | vector |
| BM_Memcmp/0/0 | - | 107% | 109% | - | 120% | 105% | 119% |
| BM_Memcmp/1/0 | - | 102% | 104% | - | 120% | 110% | 127% |
| BM_Memcmp/2/0 | - | 106% | 105% | - | 116% | 104% | 115% |
| BM_Memcmp/3/0 | - | 114% | 113% | - | 119% | 106% | 120% |
| BM_Memcmp/4/0 | - | 104% | 103% | - | 115% | 112% | 112% |
| BM_Memcmp/5/0 | - | 106% | 112% | - | 123% | 123% | 124% |
| BM_Memcmp/6/0 | - | 110% | 106% | - | 117% | 114% | 116% |
| BM_Memcmp/7/0 | - | 104% | 103% | - | 116% | 115% | 119% |
| BM_Memcmp/8/0 | - | 106% | 104% | - | 114% | 106% | 110% |
| BM_Memcmp/9/0 | - | 99% | 99% | - | 104% | 100% | 101% |
This is a clear indication that the `avx512vbmi` version using GPR is not good. I'll disable it for now and we can re-enable it when clang bumps to a version [with better codegen](https://github.com/llvm/llvm-project/issues/77459#issuecomment-1883251957).
https://github.com/llvm/llvm-project/pull/77081
More information about the libc-commits
mailing list