[cfe-dev] RFC: Fuzzy parser for highlighting C++

Johannes Kapfhammer kapf at student.ethz.ch
Wed Jul 30 10:28:52 PDT 2014


Hi all,

I am working on a Google Summer of Code project to use the clang lexer for
syntax highlighting.  The intended usage is to highlight C++ for LaTeX
(papers, presentations), HTML (documentations, wikis) and other formats.
My goal is to provide a better alternative to Pygments (which highlights C++
on the llvm.org docs) or GNU Source-highlight.  These tools can identify
keywords perfectly well, but aren't able to highlight types and functions.

To correctly highlight those source snippets, I wrote a fuzzy parser library on
top of the clang lexer.  The clang parser cannot be used for this as snippets
don't need to be self-contained, e.g. use types or functions which definitions
aren't included.

The fuzzy parser doesn't understand all language constructs of C++, but enough
to produce a reasonably good highlighting.  A sample output produced with
LaTeX an be found on github [1] (136 KB).  There's also more documentation
about clang-highlight [2] and the fuzzy parser [3].

I submitted my work for review on phabricator [4] to get it into
clang/tools/extra.

The fuzzy parser is a general library that may have some other potential uses
beside clang-highlight.  clang-format internally has a similar fuzzy parser
and is currently more complete, but not written in a reusable way.
Another possible use would be for an auto complete system for editors.

Any opinions or suggestions about this project?

Best,
Johannes

  1 : https://github.com/kapf/clang-highlight/blob/master/latex/fuzzyparser.pdf?raw=true
  2 : https://github.com/kapf/clang-highlight/blob/master/docs/clang-highlight.rst
  3 : https://github.com/kapf/clang-highlight/blob/master/docs/LibFuzzy.rst
  4 : http://reviews.llvm.org/D4725



More information about the cfe-dev mailing list