[cfe-dev] RFC: Fuzzy parser for highlighting C++

Daniel Jasper djasper at google.com
Wed Jul 30 13:08:07 PDT 2014


On Wed, Jul 30, 2014 at 8:13 PM, Sean Silva <chisophugis at gmail.com> wrote:

>
>
>
> On Wed, Jul 30, 2014 at 11:28 AM, Johannes Kapfhammer <
> kapf at student.ethz.ch> wrote:
>
>> Hi all,
>>
>> I am working on a Google Summer of Code project to use the clang lexer for
>> syntax highlighting.  The intended usage is to highlight C++ for LaTeX
>> (papers, presentations), HTML (documentations, wikis) and other formats.
>> My goal is to provide a better alternative to Pygments (which highlights
>> C++
>> on the llvm.org docs) or GNU Source-highlight.  These tools can identify
>> keywords perfectly well, but aren't able to highlight types and functions.
>>
>> To correctly highlight those source snippets, I wrote a fuzzy parser
>> library on
>> top of the clang lexer.  The clang parser cannot be used for this as
>> snippets
>> don't need to be self-contained, e.g. use types or functions which
>> definitions
>> aren't included.
>>
>> The fuzzy parser doesn't understand all language constructs of C++, but
>> enough
>> to produce a reasonably good highlighting.  A sample output produced with
>> LaTeX an be found on github [1] (136 KB).  There's also more documentation
>> about clang-highlight [2] and the fuzzy parser [3].
>>
>> I submitted my work for review on phabricator [4] to get it into
>> clang/tools/extra.
>>
>> The fuzzy parser is a general library that may have some other potential
>> uses
>> beside clang-highlight.  clang-format internally has a similar fuzzy
>> parser
>> and is currently more complete, but not written in a reusable way.
>>
>
> Have you tried talking to the clang-format authors about making this code
> more reusable? I think a reusable "fuzzy parser" would be quite generally
> useful.
>
> Realistically speaking I doubt (purely from a maintenance perspective)
> that we will ever have 2 fuzzy parsers in-tree so evolving the clang-format
> parser seems like the natural path forward for this sort of work.
>

Yes he has and I did mentor his project. I generally agree that we don't
want to have 2 fuzzy parsers, but at this stage, clang-format's parser is
to intricately tangled with clang-format itself. A fresh start seems like
the most promising approach to me, taking some of the learnings of
clang-format's parser and putting them into a reusable library. If
successful, we'll be able to switch clang-format over to that parser and
simplify clang-format's implementation.

Also, while clang-format's parser is more complete ins some ways, it has
also been highly tuned to extract only the information from the source code
that is relevant to formatting. E.g. while it might be essential for
highlighting to (somewhat) correctly determine type information, it doesn't
matter for source code formatting at several places. Thus, I am not sure
whether clang-format's current parser can really be reused/extended for
other applications.


> -- Sean Silva
>
>
>> Another possible use would be for an auto complete system for editors.
>>
>> Any opinions or suggestions about this project?
>>
>> Best,
>> Johannes
>>
>>   1 :
>> https://github.com/kapf/clang-highlight/blob/master/latex/fuzzyparser.pdf?raw=true
>>   2 :
>> https://github.com/kapf/clang-highlight/blob/master/docs/clang-highlight.rst
>>   3 :
>> https://github.com/kapf/clang-highlight/blob/master/docs/LibFuzzy.rst
>>   4 : http://reviews.llvm.org/D4725
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20140730/d5e6f4eb/attachment.html>


More information about the cfe-dev mailing list