[cfe-dev] RFC: Fuzzy parser for highlighting C++

Sean Silva chisophugis at gmail.com
Wed Jul 30 17:02:52 PDT 2014


On Wed, Jul 30, 2014 at 2:08 PM, Daniel Jasper <djasper at google.com> wrote:

>
>
>
> On Wed, Jul 30, 2014 at 8:13 PM, Sean Silva <chisophugis at gmail.com> wrote:
>
>>
>>
>>
>> On Wed, Jul 30, 2014 at 11:28 AM, Johannes Kapfhammer <
>> kapf at student.ethz.ch> wrote:
>>
>>> Hi all,
>>>
>>> I am working on a Google Summer of Code project to use the clang lexer
>>> for
>>> syntax highlighting.  The intended usage is to highlight C++ for LaTeX
>>> (papers, presentations), HTML (documentations, wikis) and other formats.
>>> My goal is to provide a better alternative to Pygments (which highlights
>>> C++
>>> on the llvm.org docs) or GNU Source-highlight.  These tools can identify
>>> keywords perfectly well, but aren't able to highlight types and
>>> functions.
>>>
>>> To correctly highlight those source snippets, I wrote a fuzzy parser
>>> library on
>>> top of the clang lexer.  The clang parser cannot be used for this as
>>> snippets
>>> don't need to be self-contained, e.g. use types or functions which
>>> definitions
>>> aren't included.
>>>
>>> The fuzzy parser doesn't understand all language constructs of C++, but
>>> enough
>>> to produce a reasonably good highlighting.  A sample output produced with
>>> LaTeX an be found on github [1] (136 KB).  There's also more
>>> documentation
>>> about clang-highlight [2] and the fuzzy parser [3].
>>>
>>> I submitted my work for review on phabricator [4] to get it into
>>> clang/tools/extra.
>>>
>>> The fuzzy parser is a general library that may have some other potential
>>> uses
>>> beside clang-highlight.  clang-format internally has a similar fuzzy
>>> parser
>>> and is currently more complete, but not written in a reusable way.
>>>
>>
>> Have you tried talking to the clang-format authors about making this code
>> more reusable? I think a reusable "fuzzy parser" would be quite generally
>> useful.
>>
>> Realistically speaking I doubt (purely from a maintenance perspective)
>> that we will ever have 2 fuzzy parsers in-tree so evolving the clang-format
>> parser seems like the natural path forward for this sort of work.
>>
>
> Yes he has and I did mentor his project.
>

Ah. I'm surprised that there hasn't been more (any?) on-list traffic about
this; usually we at least have an RFC (and doesn't GSoC require one?). Did
a GSoC proposal ever make it to the list?


> I generally agree that we don't want to have 2 fuzzy parsers, but at this
> stage, clang-format's parser is to intricately tangled with clang-format
> itself. A fresh start seems like the most promising approach to me, taking
> some of the learnings of clang-format's parser and putting them into a
> reusable library. If successful, we'll be able to switch clang-format over
> to that parser and simplify clang-format's implementation.
>

Neat. That would be great.

-- Sean Silva


>
> Also, while clang-format's parser is more complete ins some ways, it has
> also been highly tuned to extract only the information from the source code
> that is relevant to formatting. E.g. while it might be essential for
> highlighting to (somewhat) correctly determine type information, it doesn't
> matter for source code formatting at several places. Thus, I am not sure
> whether clang-format's current parser can really be reused/extended for
> other applications.
>
>
>> -- Sean Silva
>>
>>
>>> Another possible use would be for an auto complete system for editors.
>>>
>>> Any opinions or suggestions about this project?
>>>
>>> Best,
>>> Johannes
>>>
>>>   1 :
>>> https://github.com/kapf/clang-highlight/blob/master/latex/fuzzyparser.pdf?raw=true
>>>   2 :
>>> https://github.com/kapf/clang-highlight/blob/master/docs/clang-highlight.rst
>>>   3 :
>>> https://github.com/kapf/clang-highlight/blob/master/docs/LibFuzzy.rst
>>>   4 : http://reviews.llvm.org/D4725
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>>
>>
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20140730/81053abf/attachment.html>


More information about the cfe-dev mailing list