[libc-commits] [libc] [libc] Fix buggy AVX2 `memcmp` (PR #77081)

Guillaume Chatelet via libc-commits libc-commits at lists.llvm.org
Tue Jan 9 07:59:37 PST 2024


gchatelet wrote:

I've run a couple of benchmarks. All numbers are throughput in GB/s : higher is better. They are run on a sapphirerapids machine.

 - `pre-nafi base` is revision aa28875a745d4d70f491b59e1b50f592994c923b, just before the buggy version. It represents the previous implementation.
 - `buggy` is 124efcaa973306ce42633cea07ed3cf55d63afde.
 - `GPR` is this patch at commit [2ca4747](https://github.com/llvm/llvm-project/pull/77081/commits/2ca47472277a349f3e28834cc3918565fc00ccea).
 -  `vector` is this patch at commit [2ca4747](https://github.com/llvm/llvm-project/pull/77081/commits/2ca47472277a349f3e28834cc3918565fc00ccea) with the `avx512vbmi` version disabled, that is [these lines removed](https://github.com/llvm/llvm-project/pull/77081/files#diff-115856bf76e3fda2c70579070453144469415cad393afe916b3e8c41f03a5db3R241-R255).



|               |      AVX2     |         |         |     AVX512    |         |         |         |
|---------------|:-------------:|:-------:|:-------:|:-------------:|:-------:|:-------:|:-------:|
|               | pre-nafi base | buggy   | vector  | pre-nafi base | buggy   | GPR     | vector  |
| BM_Memcmp/0/0 |       2.91849 | 3.11338 |  3.1697 |       2.96577 | 3.57009 | 3.11467 |  3.5289 |
| BM_Memcmp/1/0 |       8.54724 | 8.72347 | 8.91954 |       8.26899 | 9.91166 | 9.12617 | 10.4736 |
| BM_Memcmp/2/0 |       5.57227 | 5.88153 | 5.86237 |       5.63377 | 6.52597 | 5.84524 | 6.46245 |
| BM_Memcmp/3/0 |       4.65409 | 5.31556 | 5.24833 |       4.93053 | 5.87956 | 5.23889 | 5.92499 |
| BM_Memcmp/4/0 |       2.41787 | 2.52415 | 2.50199 |       2.34756 | 2.69541 | 2.62342 | 2.61858 |
| BM_Memcmp/5/0 |       4.95331 | 5.26778 | 5.52597 |       4.73212 | 5.80927 | 5.84406 | 5.89062 |
| BM_Memcmp/6/0 |       8.26034 | 9.09897 |  8.7936 |       8.34453 | 9.72203 | 9.48509 | 9.68149 |
| BM_Memcmp/7/0 |       5.76333 | 5.97761 | 5.91712 |       5.50937 | 6.40736 | 6.36177 | 6.57352 |
| BM_Memcmp/8/0 |       3.12097 |   3.301 | 3.24115 |        3.1782 |  3.6211 |   3.358 | 3.49582 |
| BM_Memcmp/9/0 |       40.1616 | 39.7599 | 39.6047 |       62.2187 | 64.8115 |  62.472 | 63.1284 |

Same table as percentage of base
|               |      AVX2     |       |        |     AVX512    |       |      |        |
|---------------|:-------------:|:-----:|:------:|:-------------:|:-----:|:----:|:------:|
|               | pre-nafi base | buggy | vector | pre-nafi base | buggy | GPR  | vector |
| BM_Memcmp/0/0 | -             |  107% |   109% | -             |  120% | 105% |   119% |
| BM_Memcmp/1/0 | -             |  102% |   104% | -             |  120% | 110% |   127% |
| BM_Memcmp/2/0 | -             |  106% |   105% | -             |  116% | 104% |   115% |
| BM_Memcmp/3/0 | -             |  114% |   113% | -             |  119% | 106% |   120% |
| BM_Memcmp/4/0 | -             |  104% |   103% | -             |  115% | 112% |   112% |
| BM_Memcmp/5/0 | -             |  106% |   112% | -             |  123% | 123% |   124% |
| BM_Memcmp/6/0 | -             |  110% |   106% | -             |  117% | 114% |   116% |
| BM_Memcmp/7/0 | -             |  104% |   103% | -             |  116% | 115% |   119% |
| BM_Memcmp/8/0 | -             |  106% |   104% | -             |  114% | 106% |   110% |
| BM_Memcmp/9/0 | -             |   99% |    99% | -             |  104% | 100% |   101% |

This is a clear indication that the `avx512vbmi` version using GPR is not good. I'll disable it for now and we can re-enable it when clang bumps to a version [with better codegen](https://github.com/llvm/llvm-project/issues/77459#issuecomment-1883251957).

https://github.com/llvm/llvm-project/pull/77081


More information about the libc-commits mailing list