[llvm] [RFC] [msan] make MSan up to 20x faster on AMD CPUs (PR #171993)

Fri Dec 12 08:29:24 PST 2025

Camsyn wrote:

FYI. 

I think the root cause might be the [`Linear address utag/way-predictor`](https://lsferreira.net/public/knowledge-base/x86/upos/amd_zen+.PDF) mechanism.

---

Actually, under `-O3` with MSan instrumentation, the code effectively reduces to frequent alternating writes to two addresses:

```c++
volatile u64 result;
uptr shadow = (&result) ^ 0x500000000000;

for (u64 i = 0; i < 100000000; i++) {
    *shadow = 0; // set shadow
    result = 2432902008176640000; // comes from 20!
}
```

**The L1D Cache Geometry**
On Zen, the 32 KiB L1D cache is 8-way associative with 64-byte cache lines (64 sets). `&result` and `shadow` share the same `set_index` (bits 6-11) but have different `tags`. Normally, the 8-way associativity would accommodate both lines without conflict.

**The Root Cause: Way Predictor Aliasing**
The issue arises from AMD Zen's [`Linear address utag/way-predictor`](https://lsferreira.net/public/knowledge-base/x86/upos/amd_zen+.PDF). To save power, the CPU does not check all 8 tags in parallel; instead, it uses a *µtag* to **predict** the way.

>  Figure 3 of this [paper](https://inria.hal.science/hal-02866777/document) shows that the *µtag* is derived from `addr[12:27]`. Unfortunately, `shadow` and `&result` share identical bits in the `[0:43]` range, i.e., share the same *µtag*.
> <img width="711" height="240" alt="image" src="https://github.com/user-attachments/assets/959bfd16-05a9-4042-94b4-488a2279643a" />

**Conclusion**
Because `shadow` and `&result` yield the same *µtag*, the way-predictor forces them into the same slot. 

This effectively degrades the 8-way cache into a **Direct-Mapped** cache for these specific addresses, causing the severe L1D cache thrashing.

https://github.com/llvm/llvm-project/pull/171993