[cfe-dev] Working on the rest of PR10063: destructors and the CFG are causing issues with -Wreturn-type

Chandler Carruth chandlerc at google.com
Mon Sep 12 16:10:37 PDT 2011


On Mon, Sep 12, 2011 at 12:21 PM, Ted Kremenek <kremenek at apple.com> wrote:

> On Sep 12, 2011, at 11:54 AM, Chandler Carruth wrote:
>
> Not yet, easy to get though. The overhead is the SmallVector<..., 10> which
> I use to reverse the VarDecl*s prior to appending them. That *shouldn't*
> have an observable impact, but I'll measure it. =] I just wanted to see if I
> was on the right track at all.
>
>>
>>
> I think you are on the right track.
>

Cool. I ran some performance numbers. Here is my methodology:

First I created a new cfg stress test which I will check in. its much like
the others, but this one creates over 32k variable declarations spread
through out nested scopes so that they have overlapping lifetimes. Each is
clustered in 32 variable declarations within a particular scope. These are a
mixture of types with noreturn and normal destructors. This is a *worst*
case scenario!!!

That explodes the number of CFG blocks from 2152 blocks to 33850!!! This is
due to the correctness change of actually modeling that the block terminates
after each noreturn destructor.

Despite this explosion in the number of CFG blocks, I can measure *no*
regression in -fsyntax-only performance between the two.. In fact, with my
patch, it appears to be *faster* for some reason! I don't really understand
this other than that my patch causes the push_back style growth of the
BumpVector instead of insert and assignment... Even then I suspect that
we're just well below the measuring sensitivity:

% perf stat -r5 ./bin/old_clang -fsyntax-only -Wreturn-type
../tools/clang/INPUTS/cfg-nested-var-scopes.cpp

 Performance counter stats for './bin/old_clang -fsyntax-only -Wreturn-type
../tools/clang/INPUTS/cfg-nested-var-scopes.cpp' (5 runs):

        1083.346099  task-clock-msecs         #      0.996 CPUs    ( +-
0.194% )
                112  context-switches         #      0.000 M/sec   ( +-
0.356% )
                  1  CPU-migrations           #      0.000 M/sec   ( +-
 28.571% )
              15025  page-faults              #      0.014 M/sec   ( +-
0.003% )
         2739266060  cycles                   #   2528.523 M/sec   ( +-
0.170% )
         1781584392  instructions             #      0.650 IPC     ( +-
0.061% )
          332083620  branches                 #    306.535 M/sec   ( +-
0.061% )
           22089492  branch-misses            #      6.652 %       ( +-
0.160% )
           48828985  cache-references         #     45.072 M/sec   ( +-
0.564% )
             936082  cache-misses             #      0.864 M/sec   ( +-
0.327% )

        1.087202120  seconds time elapsed   ( +-   0.201% )

% perf stat -r5 ./bin/clang -fsyntax-only -Wreturn-type
../tools/clang/INPUTS/cfg-nested-var-scopes.cpp

 Performance counter stats for './bin/clang -fsyntax-only -Wreturn-type
../tools/clang/INPUTS/cfg-nested-var-scopes.cpp' (5 runs):

        1066.387627  task-clock-msecs         #      0.997 CPUs    ( +-
0.245% )
                110  context-switches         #      0.000 M/sec   ( +-
0.407% )
                  1  CPU-migrations           #      0.000 M/sec   ( +-
 16.667% )
              16428  page-faults              #      0.015 M/sec   ( +-
0.004% )
         2696143767  cycles                   #   2528.296 M/sec   ( +-
0.213% )
         1842263749  instructions             #      0.683 IPC     ( +-
0.062% )
          343370993  branches                 #    321.995 M/sec   ( +-
0.068% )
           22275654  branch-misses            #      6.487 %       ( +-
0.269% )
           46956180  cache-references         #     44.033 M/sec   ( +-
0.245% )
            1126887  cache-misses             #      1.057 M/sec   ( +-
0.339% )

        1.069884336  seconds time elapsed   ( +-   0.247% )

Unless you see something fishy, I'll plan on committing this and starting on
some of the cleanups.


> Keep in mind, that we can also possibly change the internal representation
> of a CFGBlock if it makes it easier to do the splitting, etc., while still
> maintaining good performance.  For example, we could possibly remove
> operator[] from CFGBlock, if removing the random access feature makes it
> easier to implement such changes with good performance.
>

Yea, this might be interesting long term... however before we go that route
I want to have a benchmark that actually slows down. Buliding the CFG is
*fast* right now... ridiculously fast... so my focus will be elsewhere. =]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20110912/749988ce/attachment.html>


More information about the cfe-dev mailing list