[PATCH] D23354: [compiler-rt] Support dynamic shadow address instrumentation

Tue Sep 13 07:41:51 PDT 2016

etienneb added a comment.

We performed benchmarking of static and dynamic instrumentation on windows and on linux 64-bits to compare speed and code size impact.

Benchmarks used for the analysis are:

- Chromium release build,
- Clang debug and release build,
- and some executables of the CPU2006 suite.

On windows 64-bits instrumented executables, we are observing a reduction in code size of about 3%. This can be explained by the fact that the shadow address is kept in a register, loaded in each function prologue. Thus, subsequent load instructions are smaller because they are not using 8-bytes for a fixed constant.

The same gains are observed when using the dynamic instrumentation on linux 64-bits, even if the shadow address constant is able to fit in 4-bytes.

Using different levels of optimisation (O0, O1 and O2) on linux 64-bits shows that there is a code size increase of about 20% on clang debug builds, but not on release builds. This loss is not observed on CPU2006 executables. By looking manually at a few functions, we figured out that functions with lots of local variables are creating high register pressure and the shadow address is not kept in a register and it is spilled on the stack, leading to an increase in function size.

On the speed aspect we are observing a loss of about 3% on windows CPU2006 benchmarks. On the other benchmarks the speed variation is lost in the noise.

Assembly code of static instrumentation:

  shr    $0x3,%rax
  mov    %rax,0x70(%rbx)
  cmpb   $0x0,0x7fff8000(%rax)        # bytes: 80 b8 00 80 ff 7f 00 (7-bytes)
  jne    591a90 <main+0x1140>

Assembly code of dynamic instrumentation:

  prologue:
    lea    0x301d40(%rip),%rax        # <__asan_shadow_memory_dynamic_address>
    mov    (%rax),%rdx                # %rdx contains shadow address

  [...]
  Instrumentation size:
    shr    $0x3,%rax
    mov    %rax,0x78(%rbx)
    cmpb   $0x0,(%rax,%rdx,1)         # bytes: 80 3c 10 00 (4-bytes, smaller)
    jne    58ecf9 <main+0x1139>

We can observed the gain in code size on the cmpb instruction.

Chromium code size benchmark results:

  CHROMIUM

  linux
         base      static   dynamic     S/B    D/B   D/S
  bro    000311c6	001934c6 0018fb06   821%   814%  99%
  flatc  000635a6	002269f6 00220f66   554%   549%  99%
  chrome 04a49f06	110865d6 10ab2686   367%   359%  98%

  windows
         base      static   dynamic   S/B    D/B   D/S
  bro       6A000   100000    FE000   242%   240%  99%
  flatc     D6000   2BE000   2B1000   328%   322%  98%
  chrome  3E5B000 10FAD000 109FA000   436%   427%  98%
  c_child 3A09000  F69A000  F184000   425%   416%  98%

On chromium benchmark, the code size is always slightly improving.

CPU2006 codesize benchmark results:

  CPU2026              
                    base  static dynamic  S/B   D/B    D/S
  windows              
  /Od 470.lbm      21000   A3000  A3000   494%  494%  100%
      482.sphinx3  55000  15A000  157000  407%  404%   99%
      401.bzip2    2C000   EE000  EC000   541%  536%   99%

  /O1 470.lbm      1D000   97000  97000   521%  521%  100%
      482.sphinx3  38000   F3000  F2000   434%  432%  100%
      401.bzip2    1E000   AA000  AA000   567%  567%  100%

  /O2 470.lbm      1D000   97000  97000   521%  521%  100%
      482.sphinx3  3C000  109000  107000  442%  438%   99%
      401.bzip2    22000   BC000  BC000   553%  553%  100%

On CPU2006 benchmarks, the code size is always slightly improving or about the same.

Clang code size metrics:

  CLANG (Release) linux
                    base    static   dynamic  S/B   D/B   D/S  
  clang-4.0     01df0c9a  0666a30a  063cebea  342%  333%  97%
  opt           00d48372  02d436aa  02c3fd3a  341%  333%  98%
  clang-format  000f3582  0053526a  00518e0a  548%  536%  98%
  llvm-link     001a34b2  00a4030a  00a04f1a  626%  612%  98%
  clang-tidy    00f682fa  0387f9ba  03700e2a  367%  357%  97%

  CLANG (Debug) linux
                    base    static   dynamic  S/B   D/B   D/S  
  clang-4.0     043bbd30  04fa5b80  063cebea  118%  147%  125%
  opt           01aab360  02648e80  02c3fd3a  144%  166%  116%
  clang-format  0047aff0  0073a340  00518e0a  161%  114%  71%
  llvm-link     007618f0  00b2e8f0  00a04f1a  151%  136%  90%
  clang-tidy    026478d0  02d32dd0  03700e2a  118%  144%  122%

On clang-benchmark, the code size is improving with Release builds and decreasing with Debug builds.

https://reviews.llvm.org/D23354