[cfe-dev] RFC: Extend clang-format to support more/all C-like languages

Sean Silva silvas at purdue.edu
Sat Nov 2 09:25:14 PDT 2013


On Sat, Nov 2, 2013 at 11:45 AM, Daniel Jasper <djasper at google.com> wrote:

> More specifically:
> - "var" is lexed just fined. It is the identifier lookup/parsing that you
> are thinking of and we can easily do that.
> - let ===/!== just be lexed as ==/!= and =.. We can set the spacing/line
> break appropriately. As I mentioned. We don't need to understand the
> language, we just need to format it correctly. Or just do the
> post-processing I was describing and merge the two tokens received from the
> lexer.
>

While those don't seem like a big deal in terms of hacking around Clang's
lexer not being a javascript lexer, JavaScript's regex literals do seem
like they would be quite a bit of work. For example, you might have:

if ((m = /^(\s*)([a-zA-Z0-9._-]+)\s*=\s*/.exec(chunk))) {

As long as clang's lexer doesn't choke on stuff like this, you probably can
reconstruct it just fine with a bit of work. It just seems like a source of
a lot of complexity. As a side note, there are some edge cases where the
regex literal can contains something that clang will lex as a comment, e.g.
/[/*]/ or /[//]/, but I doubt those are very common and so not problematic.
Languages with "raw" string literals that aren't lexically identical to
C++'s will also present a similar problem (e.g. a URL in a raw string
literal will result in http:// interpreted as starting a comment).

It just seems like using one language's lexer for another language is a big
hack, and very prone to running into a "showstopper" problem. There are
perfectly good javascript lexers around (e.g. <
http://esprima.org/demo/parse.html>, click on "Tokens"; check "Line and
column based" to get source location info and "include comments" for
comments). Could clang-format just have a component that accepts a stream
of tokens on stdin in some specified format and produces a list of
replacements?

-- Sean Silva


>
>
> On Sat, Nov 2, 2013 at 4:34 PM, Manuel Klimek <klimek at google.com> wrote:
>
>> On Sat, Nov 2, 2013 at 4:18 PM, Sean Silva <silvas at purdue.edu> wrote:
>>
>>>
>>>
>>>
>>> On Fri, Nov 1, 2013 at 5:50 PM, Daniel Jasper <djasper at google.com>wrote:
>>>
>>>> My gut feeling is that we won't need to change the lexer. Bear in mind,
>>>> that (same as with everything else in clang-format) we only need to
>>>> understand the language good enough to format it. There might always be
>>>> corner cases where we aren't correct, but these are rare in practice.
>>>>
>>>
>>> How do you intend to lex JavaScript's === and !== operators? Or the
>>> `var` keyword? These aren't "corner cases" at all.
>>>
>>
>> What do you expect to be formatted differently because of those?
>>
>>
>>>
>>> -- Sean Silva
>>>
>>>
>>>
>>>>
>>>> In fact, I would like to go ahead and see whether we really hit the
>>>> limit somewhere and if so, what the problems are. Once we have sufficient
>>>> information, we can make a good decision on how to continue. Options would
>>>> be allowing different lexers or post-processing the output of Clang's
>>>> lexer. I fully agree that we should not modify the lexer to accommodate
>>>> other languages.
>>>>
>>>>
>>>> On Fri, Nov 1, 2013 at 9:17 PM, Chandler Carruth <chandlerc at google.com>wrote:
>>>>
>>>>>
>>>>> On Fri, Nov 1, 2013 at 1:13 PM, Ryan Gonzalez <rymg19 at gmail.com>wrote:
>>>>>
>>>>>> What if the lexer was an overly large, non-abstract base class? Then,
>>>>>> the derived classes can just override the tokens as needed.
>>>>>
>>>>>
>>>>> (FWIW, I wouldn't try to design changes to the lexer in this thread,
>>>>> in the abstract... If this is an interesting path to pursue, I suspect
>>>>> Daniel or others should produce concrete proposed patches that enable the
>>>>> features needed and minimize the pollution of the lexer with other
>>>>> languages...)
>>>>>
>>>>> _______________________________________________
>>>>> cfe-dev mailing list
>>>>> cfe-dev at cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> cfe-dev mailing list
>>>> cfe-dev at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>>>
>>>>
>>>
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20131102/fb3ae4e6/attachment.html>


More information about the cfe-dev mailing list