[clang] [compiler-rt] [llvm] [RFC] [msan] make MSan up to 20x faster on AMD CPUs (PR #171993)

Sun Dec 14 10:40:51 PST 2025

================
@@ -443,8 +443,8 @@ static const MemoryMapParams Linux_I386_MemoryMapParams = {
 static const MemoryMapParams Linux_X86_64_MemoryMapParams = {
     0,              // AndMask (not used)
     0x500000000000, // XorMask
-    0,              // ShadowBase (not used)
-    0x100000000000, // OriginBase
+    0x200000,       // ShadowBase (== kShadowOffset)
----------------
thurstond wrote:

@Camsyn Thanks, good point!

@azat The machine code is longer if a non-zero immediate is needed:

```
48 c7 80 00 00 20 00 00 00 00 00 mov    QWORD PTR [rax+0x200000],0x0
vs.
48 c7 00 00 00 00 00             mov    QWORD PTR [rax],0x0

c6 80 00 00 20 00 00             mov    BYTE PTR [rax+0x200000],0x0
vs.
19: c6 00 00                     mov    BYTE PTR [rax],0x0
```

Even if the CPU could execute both forms at the same speed, on the same execution ports, it is still increasing code size, icache pressure, etc.

Although it's probably not a huge impact, it would still be hard to justify making codegen worse for all x86 targets when the upside is only for a subset of Zen processors.

I do want MSan to work well for Zen as well, so how about a compile-time macro (when compiling MSan itself, not the target app) that enables the 2MB offset?

https://github.com/llvm/llvm-project/pull/171993