[PATCH] D94416: [PM] Avoid duplicates in the Used/Preserved/Required sets

Wed Jan 13 13:01:17 PST 2021

bjope added a comment.

I've now tried to do some performance comparisons using `perf stat -r 100 opt -O3 -o /dev/null --verify-dom-info -verify-assumption-cache -verify-loop-info`.

Tried to find a test case in-tree that has several basic blocks (to actually spend some time in the verifiers). Here is the result ("opt-old" is without this patch and "opt-new" is with this patch):

  Performance counter stats for 'opt-old -O3 -o /dev/null --verify-dom-info -verify-assumption-cache -verify-loop-info test/Analysis/LoopNestAnalysis/imperfectnest.ll' (100 runs):

              76.38 msec task-clock:u              #    0.973 CPUs utilized            ( +-  0.63% )
                  0      context-switches:u        #    0.000 K/sec                  
                  0      cpu-migrations:u          #    0.000 K/sec                  
              4,141      page-faults:u             #    0.054 M/sec                    ( +-  0.00% )
        217,856,171      cycles:u                  #    2.852 GHz                      ( +-  0.34% )
        425,842,775      instructions:u            #    1.95  insn per cycle           ( +-  0.05% )
         87,203,931      branches:u                # 1141.767 M/sec                    ( +-  0.05% )
          1,675,703      branch-misses:u           #    1.92% of all branches          ( +-  0.24% )

           0.078464 +- 0.000485 seconds time elapsed  ( +-  0.62% )

  Performance counter stats for 'opt-new -O3 -o /dev/null --verify-dom-info -verify-assumption-cache -verify-loop-info test/Analysis/LoopNestAnalysis/imperfectnest.ll' (100 runs):

              63.83 msec task-clock:u              #    0.963 CPUs utilized            ( +-  0.97% )
                  0      context-switches:u        #    0.000 K/sec                  
                  0      cpu-migrations:u          #    0.000 K/sec                  
              4,124      page-faults:u             #    0.065 M/sec                    ( +-  0.00% )
        180,555,724      cycles:u                  #    2.828 GHz                      ( +-  1.01% )
        323,290,930      instructions:u            #    1.79  insn per cycle           ( +-  0.04% )
         66,711,896      branches:u                # 1045.074 M/sec                    ( +-  0.04% )
          1,532,334      branch-misses:u           #    2.30% of all branches          ( +-  0.35% )

           0.066310 +- 0.000631 seconds time elapsed  ( +-  0.95% )

So for that particular test this patch looks like a win.

I've also compared the result with an empty input and without verifiers. This is to see if there is an overhead when not using verifiers:

  Performance counter stats for 'opt-old -O3 -o /dev/null /dev/null' (100 runs):

              12.71 msec task-clock:u              #    0.885 CPUs utilized            ( +-  0.96% )
                  0      context-switches:u        #    0.000 K/sec                  
                  0      cpu-migrations:u          #    0.000 K/sec                  
              2,774      page-faults:u             #    0.218 M/sec                    ( +-  0.00% )
         22,026,642      cycles:u                  #    1.733 GHz                      ( +-  1.02% )
         25,342,570      instructions:u            #    1.15  insn per cycle           ( +-  0.00% )
          6,191,303      branches:u                #  487.180 M/sec                    ( +-  0.00% )
            139,025      branch-misses:u           #    2.25% of all branches          ( +-  0.29% )

           0.014362 +- 0.000132 seconds time elapsed  ( +-  0.92% )

  Performance counter stats for 'opt-new -O3 -o /dev/null /dev/null' (100 runs):

              12.80 msec task-clock:u              #    0.883 CPUs utilized            ( +-  1.02% )
                  0      context-switches:u        #    0.000 K/sec                  
                  0      cpu-migrations:u          #    0.000 K/sec                  
              2,760      page-faults:u             #    0.216 M/sec                    ( +-  0.00% )
         21,916,544      cycles:u                  #    1.712 GHz                      ( +-  1.19% )
         24,035,752      instructions:u            #    1.10  insn per cycle           ( +-  0.00% )
          5,788,238      branches:u                #  452.062 M/sec                    ( +-  0.00% )
            139,649      branch-misses:u           #    2.41% of all branches          ( +-  1.14% )

           0.014502 +- 0.000147 seconds time elapsed  ( +-  1.01% )

So even without anything to verify etc, the number of instructions/branches are smaller with the patch (looking at cycles and task-clock is more inconclusive since it varies a bit on my test server, but I'd say that opt-old and opt-new performs equally good here).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94416/new/

https://reviews.llvm.org/D94416