[cfe-dev] Working on the rest of PR10063: destructors and the CFG are causing issues with -Wreturn-type

Ted Kremenek kremenek at apple.com
Mon Sep 12 20:46:28 PDT 2011


Looks great.  Please charge ahead!

On Sep 12, 2011, at 4:10 PM, Chandler Carruth wrote:

> On Mon, Sep 12, 2011 at 12:21 PM, Ted Kremenek <kremenek at apple.com> wrote:
> On Sep 12, 2011, at 11:54 AM, Chandler Carruth wrote:
> 
>> Not yet, easy to get though. The overhead is the SmallVector<..., 10> which I use to reverse the VarDecl*s prior to appending them. That *shouldn't* have an observable impact, but I'll measure it. =] I just wanted to see if I was on the right track at all.
>> 
> 
> I think you are on the right track.
> 
> Cool. I ran some performance numbers. Here is my methodology:
> 
> First I created a new cfg stress test which I will check in. its much like the others, but this one creates over 32k variable declarations spread through out nested scopes so that they have overlapping lifetimes. Each is clustered in 32 variable declarations within a particular scope. These are a mixture of types with noreturn and normal destructors. This is a *worst* case scenario!!!
> 
> That explodes the number of CFG blocks from 2152 blocks to 33850!!! This is due to the correctness change of actually modeling that the block terminates after each noreturn destructor.
> 
> Despite this explosion in the number of CFG blocks, I can measure *no* regression in -fsyntax-only performance between the two.. In fact, with my patch, it appears to be *faster* for some reason! I don't really understand this other than that my patch causes the push_back style growth of the BumpVector instead of insert and assignment... Even then I suspect that we're just well below the measuring sensitivity:
> 
> % perf stat -r5 ./bin/old_clang -fsyntax-only -Wreturn-type ../tools/clang/INPUTS/cfg-nested-var-scopes.cpp                
> 
>  Performance counter stats for './bin/old_clang -fsyntax-only -Wreturn-type ../tools/clang/INPUTS/cfg-nested-var-scopes.cpp' (5 runs):
> 
>         1083.346099  task-clock-msecs         #      0.996 CPUs    ( +-   0.194% )
>                 112  context-switches         #      0.000 M/sec   ( +-   0.356% )
>                   1  CPU-migrations           #      0.000 M/sec   ( +-  28.571% )
>               15025  page-faults              #      0.014 M/sec   ( +-   0.003% )
>          2739266060  cycles                   #   2528.523 M/sec   ( +-   0.170% )
>          1781584392  instructions             #      0.650 IPC     ( +-   0.061% )
>           332083620  branches                 #    306.535 M/sec   ( +-   0.061% )
>            22089492  branch-misses            #      6.652 %       ( +-   0.160% )
>            48828985  cache-references         #     45.072 M/sec   ( +-   0.564% )
>              936082  cache-misses             #      0.864 M/sec   ( +-   0.327% )
> 
>         1.087202120  seconds time elapsed   ( +-   0.201% )
> 
> % perf stat -r5 ./bin/clang -fsyntax-only -Wreturn-type ../tools/clang/INPUTS/cfg-nested-var-scopes.cpp                 
> 
>  Performance counter stats for './bin/clang -fsyntax-only -Wreturn-type ../tools/clang/INPUTS/cfg-nested-var-scopes.cpp' (5 runs):
> 
>         1066.387627  task-clock-msecs         #      0.997 CPUs    ( +-   0.245% )
>                 110  context-switches         #      0.000 M/sec   ( +-   0.407% )
>                   1  CPU-migrations           #      0.000 M/sec   ( +-  16.667% )
>               16428  page-faults              #      0.015 M/sec   ( +-   0.004% )
>          2696143767  cycles                   #   2528.296 M/sec   ( +-   0.213% )
>          1842263749  instructions             #      0.683 IPC     ( +-   0.062% )
>           343370993  branches                 #    321.995 M/sec   ( +-   0.068% )
>            22275654  branch-misses            #      6.487 %       ( +-   0.269% )
>            46956180  cache-references         #     44.033 M/sec   ( +-   0.245% )
>             1126887  cache-misses             #      1.057 M/sec   ( +-   0.339% )
> 
>         1.069884336  seconds time elapsed   ( +-   0.247% )
> 
> Unless you see something fishy, I'll plan on committing this and starting on some of the cleanups.
>  
> Keep in mind, that we can also possibly change the internal representation of a CFGBlock if it makes it easier to do the splitting, etc., while still maintaining good performance.  For example, we could possibly remove operator[] from CFGBlock, if removing the random access feature makes it easier to implement such changes with good performance.
> 
> Yea, this might be interesting long term... however before we go that route I want to have a benchmark that actually slows down. Buliding the CFG is *fast* right now... ridiculously fast... so my focus will be elsewhere. =]

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20110912/19b222af/attachment.html>


More information about the cfe-dev mailing list