[cfe-dev] Announcing "clang-ctags"

Sean Silva silvas at purdue.edu
Mon Jul 23 14:54:11 PDT 2012


> PERFORMANCE:
>
> Running clang-ctags over the `lib` directory of the `clang`
> source code (480 files totalling 470k lines of code) takes 37 minutes on
> my 1.8GHz Intel Core i7. 98% of this time is the parsing done by
> libclang itself. By comparison, GNU etags takes 0.5 *seconds* on the
> same input.

Ouch. This is longer than compiling it!

My guess is that the performance problem is that a couple factors come together.

1. This is running completely single threaded. This makes it like a
factor of 2, 4, 8, etc. times slower depending on your cpu.
2. You are actually processing things like includes and such, rather
than simply doing a linear grep-like scan of all the files as I
believe GNU etags does.

The second one is the real problem, and unfortunately, there doesn't
seem to be a way to circumvent it with your current architecture.

This seems like a job much better suited to a plugin
<http://clang.llvm.org/docs/ClangPlugins.html>. If you look at the
criteria which <http://clang.llvm.org/docs/Tooling.html> suggests for
how to decide how to use Clang to build tools, this seems the most
natural fit: one of the "canonical examples" it gives is "creating
additional build artifacts from a single compile step". It also says
to use plugins when you "need your tool to rerun if any of the
dependencies change". This seems like a natural fit for what you're
doing.

--Sean Silva

On Mon, Jul 23, 2012 at 1:04 AM, David Röthlisberger <david at rothlis.net> wrote:
> Announcing "clang-ctags", a libclang-based ctags implementation written
> in python.
>
> Source code: https://github.com/drothlis/clang-ctags
>
> I took care to structure the commits in a tutorial-like fashion, so you
> could start from the oldest commit:
> https://github.com/drothlis/clang-ctags/commits/master/clang-ctags
> As time permits I'll write up a tutorial presenting the material in a
> more structured way.
>
> Currently clang-ctags only supports the Emacs ("etags") format
> (mainly because I haven't figured out how to write integration tests
> for vim).
>
> WHY:
>
> This seemed like the simplest tool I could write to get acquainted with
> libclang, and still be useful.
>
> https://github.com/drothlis/clang-ctags/blob/master/test/why.sh tests
> some specific cases that the traditional etags doesn't handle well.
>
> LESSONS LEARNED:
>
> Using this tool is far more complicated than existing ctags/etags
> implementations. To process a source file you need its compilation
> command line (there are several ways to obtain this:
> https://github.com/drothlis/clang-ctags#compilation-command-line).
>
> There are other complications. How do you process header files? You
> don't have a compilation command line for headers. The approach I've
> taken is to generate tags for a header file encountered during
> processing a source file, but only if that header file was also
> specified on the clang-ctags command line. This matches the way you
> invoke traditional ctags tools, but instead of:
>     find . -name '*.[ch]pp' | xargs ctags
> you say:
>     find . -name '*.[ch]pp' |
>     xargs clang-ctags --compile-commands=compile_commands.json
>
> (I also added a "--non-system-headers" flag to generate tags for all
> header files encountered that are under the directory where clang-ctags
> is invoked.)
>
> In general the development was very straight-forward, and clearly a tool
> to index C++ with this accuracy wouldn't be feasible without clang. But
> it still took far longer than I expected, and I'm beginning to
> understand why we haven't seen more clang-based tools springing up.
> (This is a fault of C++, not of clang! And I expect things will get
> easier as more tooling is added, like the new support for compile
> command databases.)
>
> ipython (a python shell with tab-completion) is great for discovering
> the libclang api.
>
> Deployment is going to be difficult -- until libclang and its python
> bindings are included in your system's clang packages, you'll have to
> build clang from source. (A separate project, the clang_complete plugin
> for vim, works around this by shipping a copy of cindex.py, but then you
> have to make sure that your system's version of libclang matches
> clang_complete's cindex.py.)
>
> PERFORMANCE:
>
> Running clang-ctags over the `lib` directory of the `clang`
> source code (480 files totalling 470k lines of code) takes 37 minutes on
> my 1.8GHz Intel Core i7. 98% of this time is the parsing done by
> libclang itself. By comparison, GNU etags takes 0.5 *seconds* on the
> same input.
>
> CONCLUSION:
>
> As a replacement for traditional ctags/etags, the disadvantages of
> clang-ctags may outweigh the advantages. But it could be useful as
> a base to build a more advanced indexing tool. :-)
>
> Cheers
> David Rothlisberger.
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev




More information about the cfe-dev mailing list