[PATCH] D34085: [PGO] Register promote profile counter updates

David Li via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sat Jun 10 20:42:09 PDT 2017


davidxl created this revision.
Herald added subscribers: mehdi_amini, sanjoy.

Two of the biggest problems with the current PGO instrumentation are 1) instrumented multi-threaded program performance 2) multi-threaded profile counter precision. Both are due to contentions created due to shared profile counter updates in hot regions of the program.  In a multi-threaded program with work-sharing loop, with instrumentation increasing the number of threads actually slows down the program significantly -- from 10s elapse time with one thread down to > 5min using 16 threads. What is worse, the hottest block count is 4000000000 with 1 thread, but dropped to only 177119367 with 32 threads -- 95% of the counts are lost due to data races.  Using atomic RMW is one way to fix the problem, but it will greatly slows down the instrumented program (see data below).

This patch implements the loop based register promotion for counter updates. It makes use of existing SSA updater utility (load store promotion)  and isolates the change inside the lowerer without the need to expose the aliasing properties of the counter variables to any of the existing optimizer components. The lowerer has the full knowledge of counters and requires very little analysis.  It supports speculative code motion and works at lower optimization level.

With this patch, the performance of the multi-threaded program mentioned above is improved greatly. For one thread, it speeds up the program by 22%. For 16 threads, the elapse time is only 0.9s, > 300x speedup compared without the patch. The profile counter precision is also greatly improved.  With 32 threads, the hottest block count is 3996000000 only 0.09% counts are lost.

The patch speeds up single threaded instrumentation binary performance as well.

Here are the spec2000 int numbers

   164.gzip     -0.81%
     175.vpr      2.05%
      176.gcc     11.08%
      181.mcf     -0.61%
   186.crafty     -3.46%
   197.parser      4.90%
      252.eon     18.00%
  253.perlbmk     11.21%
   255.vortex     -0.04%
    256.bzip2      8.67%
    300.twolf      3.89%

Here are the spec06 numbers

  400.perlbench         -1.87%
       401.bzip2           16.98%
         403.gcc            4.82%
         429.mcf           12.88%
       445.gobmk          1.83%
       456.hmmer         12.48%
       458.sjeng           -0.19%
  462.libquantum       28.09%
     464.h264ref          6.49%
     471.omnetpp         1.21%
       473.astar             8.31%
   483.xalancbmk        0.95%
      450.soplex          12.35%
      447.dealII            6.33%
      453.povray         -3.29%
        444.namd          1.88%

I did some analysis on the povray regression: the program has a few hot loops with some blocks guarded by input flags which are never executed. Hoisting counter update outside the loop thus increases dynamic instruction count.

Lastly, here is the SPEC2k number of using atomic fetch-add compared the base line without this patch:

  164.gzip       -14.02%
       175.vpr       -16.02%
       176.gcc       -14.15%
       181.mcf        -4.48%
    186.crafty       -44.98%
    197.parser       -11.69%
       252.eon       -13.79%
   253.perlbmk         6.26%
    255.vortex        -4.70%
     256.bzip2        -4.06%
     300.twolf       -17.24%


https://reviews.llvm.org/D34085

Files:
  include/llvm/Transforms/InstrProfiling.h
  include/llvm/Transforms/Instrumentation.h
  lib/Passes/PassBuilder.cpp
  lib/Transforms/IPO/PassManagerBuilder.cpp
  lib/Transforms/Instrumentation/InstrProfiling.cpp
  test/Transforms/PGOProfile/counter_promo.ll
  test/Transforms/PGOProfile/counter_promo_mexits.ll
  test/profile/Linux/counter_promo_for.c
  test/profile/Linux/counter_promo_while.c

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D34085.102128.patch
Type: text/x-patch
Size: 22320 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170611/c26c9934/attachment.bin>


More information about the llvm-commits mailing list