[cfe-dev] Announcing Crange
Tobias Grosser
tobias at grosser.es
Fri May 9 07:36:13 PDT 2014
On 09/05/2014 14:08, Anurag wrote:
> Announcing Crange: https://github.com/crange/crange
>
> Summary
> -------
>
> Crange is a tool to index and cross-reference C/C++ source code. It
> can be used to generate tags database that can help with:
>
> * Identifier definitions
> * Identifier declaraions
> * References
> * Expressions
> * Operators
> * Symbols
> * Source range
>
> The source metadata collected by Crange can help with building tools
> to provide cross-referencing, syntax highlighting, code folding and
> deep source code search.
>
>
> Rationale
> ---------
>
> I was looking for tools that can extract and index identifiers present
> in C/C++ source code and can work with large code bases.
>
> Considering the amount of data Clang can generate while traversing
> very large C/C++ projects (like, Linux), I decided against using
> ctags/etags style tags database. Crange uses SQLite based tags
> database to store identifiers and metadata, and uses SQLite's bulk
> insert capabilities wherever possible.
>
> I've used python's multiprocessing library to parallelize translation
> unit traversal and metadata extraction from identifiers. Its possible
> to control the number of jobs using -j command line option.
>
>
> Usage example
> -------------
>
> Generating tags database for Linux 3.13.5
>
> $ cd linux-3.13.5
> $ crtags -v -j 32 .
> Parsing fs/xfs/xfs_bmap_btree.c (count: 1)
> Indexing fs/xfs/xfs_bmap_btree.c (nodes: 379, qsize: 0)
> ...
> Parsing sound/soc/codecs/ak4641.h (count: 34348)
> Generating indexes
>
> This would create a new file named tags.db containing all the
> identified tags.
>
> Search all declarations for identifier named device_create
>
> $ crange device_create
>
> Search all references for identifier named device_create
>
> $ crange -r device_create
>
> Not all command line options are available though (-b, -k etc.), as
> the tool is still in development.
>
> Performance
> -----------
>
> Running crtags on Linux kernel v3.13.5 sources (containing 45K files,
> size 614MB) took a little less than 7 hours (415m10.974s) on 32 CPU
> Xeon server with 16GB of memory and 32 jobs. The generated tags.db
> file was 22GB in size and contained 60,461,329 unique identifiers.
>
> Installation
> ------------
>
> $ sudo python setup.py install
> or
> $ sudo pip install crange
Yes, this looks interesting.
As Renato said, the run-time is critical. Some people may suggest to
implement this in C/C++. However, starting with python is probably a
good choice to try this out and also to understand the performance
implications.
Some ideas:
- You may want to look at the compilation database support in
libclang to retrieve the set of files to process as well as
the corresponding command lines
- I wonder how quick querying the database is. In fact,
if those queries are quick (less than 50ms) even for big
databases, this would extremely interesting as you could in
an editor for example add missing includes as you type.
Ah, and it does actually only work if I don't use this worker stuff, but
apply this patch:
- pool.map(worker, worker_params)
+
+ for p in worker_params:
+ worker(p)
Otherwise the process gets stuck (even on a single file) and I can not
abort it. Instead, I get
^CProcess PoolWorker-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in
_bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 113, in worker
result = (True, func(*args, **kwds))
File "/home/grosser/Projects/crange/crange/bin/crtags", line 12, in
dbkeeper
ast = queue.get()
File "<string>", line 2, in get
File "/usr/lib/python2.7/multiprocessing/managers.py", line 759, in
_callmethod
kind, result = conn.recv()
KeyboardInterrupt
^CProcess PoolWorker-4:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in
_bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
task = get()
File "/usr/lib/python2.7/multiprocessing/queues.py", line 376, in get
return recv()
KeyboardInterrupt
^CProcess PoolWorker-5:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in
_bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
task = get()
File "/usr/lib/python2.7/multiprocessing/queues.py", line 376, in get
return recv()
KeyboardInterrupt
^CProcess PoolWorker-6:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in
_bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
task = get()
File "/usr/lib/python2.7/multiprocessing/queues.py", line 376, in get
return recv()
KeyboardInterrupt
Cheers,
Tobias
More information about the cfe-dev
mailing list