[cfe-commits] r172900 - /cfe/trunk/bindings/python/clang/cindex.py

Sat Jan 19 09:13:36 PST 2013

On 01/19/2013 05:07 PM, Sean Silva wrote:
> On Sat, Jan 19, 2013 at 6:03 AM, Tobias Grosser
> <grosser at fim.uni-passau.de> wrote:
>> This is a very performance critical point for auto completion. The manual
>> implementation gives a large speedup. As it does not complicate the code a lot,
>> I figured it is worth the change. If anybody understands why the CachedProperty
>> is here so much slower, I am very interested in working on an improvement of
>> CachedProperty.
>
> It's possible that it has to do with the fact that the decorator
> causes at least one extra Python function call, which is one of the
> more expensive operations in CPython. The built-in `property` is coded
> in C (in python/Objects/descrobject.c).

Good point. Part of this may be attributed to the function call 
overhead. However, the speedup I got seemed a lot, just for function 
call overhead. I need to investigate this more.

> Unfortunately this scenario means that you probably won't be able to
> get equivalent performance from a pure-Python solution (in CPython)
> without doing something very nasty; pypy possibly won't have this
> problem.

Equivalent performance to what? A C implementation of auto complete?
This is probably right. However, in most cases, the python performance 
is not the problem as clang is the bottleneck and the python overhead
is not noticeable. There are still some cases where tuning python helps.
In case you are interested, here some profiles from my machine:

-----------------------------------------------------
File: lib/Analysis/ScalarEvolution.cpp
Line: 6643, Column: 6

   AU.

libclang code completion -                    Get TU: 0.003s (  8.3%)
libclang code completion -             Code Complete: 0.029s ( 76.2%)
libclang code completion -      Count # Results (22): 0.001s (  2.6%)
libclang code completion -                    Filter: 0.000s (  0.0%)
libclang code completion -                      Sort: 0.000s (  0.0%)
libclang code completion -                    Format: 0.002s (  6.2%)
libclang code completion -       Load into vimscript: 0.001s (  2.1%)
libclang code completion -      vimscript + snippets: 0.002s (  4.6%)

Overall: 0.039 s
-----------------------------------------------------

File: lib/Analysis/ScalarEvolution.cpp
Line: 7023, Column: 12

       std::

libclang code completion -                    Get TU: 0.008s (  5.9%)
libclang code completion -             Code Complete: 0.046s ( 34.0%)
libclang code completion -     Count # Results (768): 0.002s (  1.2%)
libclang code completion -                    Filter: 0.000s (  0.0%)
libclang code completion -                      Sort: 0.000s (  0.0%)
libclang code completion -                    Format: 0.045s ( 33.5%)
libclang code completion -       Load into vimscript: 0.007s (  5.2%)
libclang code completion -      vimscript + snippets: 0.027s ( 20.1%)

Overall: 0.136 s
-----------------------------------------------------

In case we complete some object or class with a low number of results, 
the run time is almost entirely within clang. Only when we have 
something like std:: where we get almost 800 completions, formatting the
results takes time (after my recent changes about 30%). I have a couple 
of ideas how to improve on this (further tune python code, implement one 
or two super hot functions in clang, only format the results that are 
shown to the user). However, further tuning does not seem super critical 
at the moment. On my machine all completions already show up without 
noticeable delay and even on slower machines, the completion is pretty 
fast. Still, if someone has ideas how to reduce the python overhead in 
this game further, I am sure there are some people who would appreciate 
this.

Tobi