[llvm] [RFC] [msan] make MSan up to 20x faster on AMD CPUs (PR #171993)
Kunqiu Chen via llvm-commits
llvm-commits at lists.llvm.org
Fri Dec 12 08:29:24 PST 2025
Camsyn wrote:
FYI.
I think the root cause might be the [`Linear address utag/way-predictor`](https://lsferreira.net/public/knowledge-base/x86/upos/amd_zen+.PDF) mechanism.
---
Actually, under `-O3` with MSan instrumentation, the code effectively reduces to frequent alternating writes to two addresses:
```c++
volatile u64 result;
uptr shadow = (&result) ^ 0x500000000000;
for (u64 i = 0; i < 100000000; i++) {
*shadow = 0; // set shadow
result = 2432902008176640000; // comes from 20!
}
```
**The L1D Cache Geometry**
On Zen, the 32 KiB L1D cache is 8-way associative with 64-byte cache lines (64 sets). `&result` and `shadow` share the same `set_index` (bits 6-11) but have different `tags`. Normally, the 8-way associativity would accommodate both lines without conflict.
**The Root Cause: Way Predictor Aliasing**
The issue arises from AMD Zen's [`Linear address utag/way-predictor`](https://lsferreira.net/public/knowledge-base/x86/upos/amd_zen+.PDF). To save power, the CPU does not check all 8 tags in parallel; instead, it uses a *µtag* to **predict** the way.
> Figure 3 of this [paper](https://inria.hal.science/hal-02866777/document) shows that the *µtag* is derived from `addr[12:27]`. Unfortunately, `shadow` and `&result` share identical bits in the `[0:43]` range, i.e., share the same *µtag*.
> <img width="711" height="240" alt="image" src="https://github.com/user-attachments/assets/959bfd16-05a9-4042-94b4-488a2279643a" />
**Conclusion**
Because `shadow` and `&result` yield the same *µtag*, the way-predictor forces them into the same slot.
This effectively degrades the 8-way cache into a **Direct-Mapped** cache for these specific addresses, causing the severe L1D cache thrashing.
https://github.com/llvm/llvm-project/pull/171993
More information about the llvm-commits
mailing list