[PATCH] D23354: [compiler-rt] Support dynamic shadow address instrumentation
Etienne Bergeron via llvm-commits
llvm-commits at lists.llvm.org
Tue Sep 13 07:41:51 PDT 2016
etienneb added a comment.
We performed benchmarking of static and dynamic instrumentation on windows and on linux 64-bits to compare speed and code size impact.
Benchmarks used for the analysis are:
- Chromium release build,
- Clang debug and release build,
- and some executables of the CPU2006 suite.
On windows 64-bits instrumented executables, we are observing a reduction in code size of about 3%. This can be explained by the fact that the shadow address is kept in a register, loaded in each function prologue. Thus, subsequent load instructions are smaller because they are not using 8-bytes for a fixed constant.
The same gains are observed when using the dynamic instrumentation on linux 64-bits, even if the shadow address constant is able to fit in 4-bytes.
Using different levels of optimisation (O0, O1 and O2) on linux 64-bits shows that there is a code size increase of about 20% on clang debug builds, but not on release builds. This loss is not observed on CPU2006 executables. By looking manually at a few functions, we figured out that functions with lots of local variables are creating high register pressure and the shadow address is not kept in a register and it is spilled on the stack, leading to an increase in function size.
On the speed aspect we are observing a loss of about 3% on windows CPU2006 benchmarks. On the other benchmarks the speed variation is lost in the noise.
Assembly code of static instrumentation:
shr $0x3,%rax
mov %rax,0x70(%rbx)
cmpb $0x0,0x7fff8000(%rax) # bytes: 80 b8 00 80 ff 7f 00 (7-bytes)
jne 591a90 <main+0x1140>
Assembly code of dynamic instrumentation:
prologue:
lea 0x301d40(%rip),%rax # <__asan_shadow_memory_dynamic_address>
mov (%rax),%rdx # %rdx contains shadow address
[...]
Instrumentation size:
shr $0x3,%rax
mov %rax,0x78(%rbx)
cmpb $0x0,(%rax,%rdx,1) # bytes: 80 3c 10 00 (4-bytes, smaller)
jne 58ecf9 <main+0x1139>
We can observed the gain in code size on the cmpb instruction.
Chromium code size benchmark results:
CHROMIUM
linux
base static dynamic S/B D/B D/S
bro 000311c6 001934c6 0018fb06 821% 814% 99%
flatc 000635a6 002269f6 00220f66 554% 549% 99%
chrome 04a49f06 110865d6 10ab2686 367% 359% 98%
windows
base static dynamic S/B D/B D/S
bro 6A000 100000 FE000 242% 240% 99%
flatc D6000 2BE000 2B1000 328% 322% 98%
chrome 3E5B000 10FAD000 109FA000 436% 427% 98%
c_child 3A09000 F69A000 F184000 425% 416% 98%
On chromium benchmark, the code size is always slightly improving.
CPU2006 codesize benchmark results:
CPU2026
base static dynamic S/B D/B D/S
windows
/Od 470.lbm 21000 A3000 A3000 494% 494% 100%
482.sphinx3 55000 15A000 157000 407% 404% 99%
401.bzip2 2C000 EE000 EC000 541% 536% 99%
/O1 470.lbm 1D000 97000 97000 521% 521% 100%
482.sphinx3 38000 F3000 F2000 434% 432% 100%
401.bzip2 1E000 AA000 AA000 567% 567% 100%
/O2 470.lbm 1D000 97000 97000 521% 521% 100%
482.sphinx3 3C000 109000 107000 442% 438% 99%
401.bzip2 22000 BC000 BC000 553% 553% 100%
On CPU2006 benchmarks, the code size is always slightly improving or about the same.
Clang code size metrics:
CLANG (Release) linux
base static dynamic S/B D/B D/S
clang-4.0 01df0c9a 0666a30a 063cebea 342% 333% 97%
opt 00d48372 02d436aa 02c3fd3a 341% 333% 98%
clang-format 000f3582 0053526a 00518e0a 548% 536% 98%
llvm-link 001a34b2 00a4030a 00a04f1a 626% 612% 98%
clang-tidy 00f682fa 0387f9ba 03700e2a 367% 357% 97%
CLANG (Debug) linux
base static dynamic S/B D/B D/S
clang-4.0 043bbd30 04fa5b80 063cebea 118% 147% 125%
opt 01aab360 02648e80 02c3fd3a 144% 166% 116%
clang-format 0047aff0 0073a340 00518e0a 161% 114% 71%
llvm-link 007618f0 00b2e8f0 00a04f1a 151% 136% 90%
clang-tidy 026478d0 02d32dd0 03700e2a 118% 144% 122%
On clang-benchmark, the code size is improving with Release builds and decreasing with Debug builds.
https://reviews.llvm.org/D23354
More information about the llvm-commits
mailing list