[cfe-commits] [PATCH] Interface proposal for clang formatting tools

Wed Jun 6 03:23:35 PDT 2012

On Wed, Jun 6, 2012 at 3:08 AM, Manuel Klimek <klimek at google.com> wrote:

> On Wed, Jun 6, 2012 at 12:04 PM, Chandler Carruth <chandlerc at google.com>wrote:
>
>> On Wed, Jun 6, 2012 at 2:57 AM, Manuel Klimek <klimek at google.com> wrote:
>>
>>> Daniel & me have worked on a first proposal to provide an interface for
>>> clang formatting as part of the Tooling library.
>>>
>>
>> Please re-send the patch as an attachment? Email plays havoc w/
>> 80-columns.
>>
>
> Done.
>

Thanks!

I think it would help a lot to have a bit more code & context when
evaluating this, even as a very early step.

My concrete suggestion would be this: implement a bozo-formatter. A
completely trivial to implement, BS set of rules that only requires lexing.
Then we can see the interface, a trivial implementation, and some test
cases that use it. I think that'll make this much easier to review and
really analyze. It also makes it much closer to something that could get
checked in as a stepping stone.

> Questions:
>>> - please point out anything that might make this interface not viable
>>> for your own use cases
>>> - are we missing the point somewhere?
>>> - is Tooling/ the right place for this?
>>>
>>
I think Tooling is the right place because I think essentially every tool
anyone writes will want to have *some* formatting applied to it. Leveraging
this from an existing tool and composing cleanly w/ it will almost
certainly be important.

A few specific comments that jump out at me:

+//  Various functions to configurably format source code.
+//
+//  This supports two use cases:
+//  - Format (ranges in) a given file.
+//  - Reformat sources after automated refactoring.

I don't (yet) see how this is supported. I suspect my example request will
fix...

+//
+//  Formatting is done on a per file basis.
+//
+//  The formatting strategy is:
+//  - lex the file content
+//  - in case formatting decisions need information of the AST analysis,
run
+//    the analysis and annotate all tokens that are relevant for formatting
 +//    in the current file

How does this interact with the just-refactored use-case? Ideally, I would
like to have the option of using the existing AST (when the refactor didn't
change it in interesting ways, f.ex., it just renamed bits), or have it
re-compute one.

I worry about requiring recomputing the AST after a refactor -- I may need
to finish applying the refactor across multiple files before things start
to compile again... Maybe the existing rewriter / refactoring APIs handle
this though.

+//  - reformat based on the (annotated) tokens

Where are these tokens annotated? I don't see any real place for that in
the interface below.

+//
+//  To allow different strategies of handling the SourceManager the
formatter
+//  works on an interface AnalyzableSource, which provides access to the
file
+//  contents and, if available, allow to lazily run an analysis over the
AST
+//  when needed.

Should this be what provides access to the token stream?

+/// \brief A character range of source code.
+struct CodeRange {

Why doesn't CharSourceRange work? This seems an odd entity to have stashed
away inside of this library.

Why doesn't a pair of iterators into the buffer work?

Hmm, below I see this is almost a user-input... Do you really want a pair
of line+column coordinates?

+/// \brief Set of formatting options.
+struct FormatOptions {

Perhaps I'm overly pessimistic, but I strongly expect to want most
formatters to have their own custom *logic*, not just a set of options.

I had imagined more of a visitor system for formatting where there were
methods that could be overridden for each formatting "event". These methods
would be given access to the location, state from previous events, and be
able to return information about line breaks, indentation, whatever was
relevant for that event. They could also query for the current AST node,
walk around the AST, navigate, etc.

These would then be supplemented by helper logic to handle common patterns
and idioms. Those would form the basis of code reuse between formatting
systems.

I'm fine not going completely crazy right off the bat, but I wonder whether
it's worth trying to begin the design of a visitor-like logic-based
interface for performing a particular format now... even if we end up with
only one implementation and an options struct for a long time. (Honestly, I
would expect all of them to take an options struct to supplement the logic
anyways...)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20120606/18d7f361/attachment.html>