vchuravy wrote: On a Genoa machine (AMD EPYC 9384X), a benchmark of mine takes 14.13s to execute with seq_cst defaulting to `mfence` and 9.99s with `lock or`. This is single-threaded... https://github.com/llvm/llvm-project/pull/106555