[cfe-commits] [PATCH] Interface proposal for clang formatting tools

Wed Jun 6 03:56:31 PDT 2012

On Wed, Jun 6, 2012 at 12:23 PM, Chandler Carruth <chandlerc at google.com>wrote:

> On Wed, Jun 6, 2012 at 3:08 AM, Manuel Klimek <klimek at google.com> wrote:
>
>> On Wed, Jun 6, 2012 at 12:04 PM, Chandler Carruth <chandlerc at google.com>wrote:
>>
>>> On Wed, Jun 6, 2012 at 2:57 AM, Manuel Klimek <klimek at google.com> wrote:
>>>
>>>> Daniel & me have worked on a first proposal to provide an interface for
>>>> clang formatting as part of the Tooling library.
>>>>
>>>
>>> Please re-send the patch as an attachment? Email plays havoc w/
>>> 80-columns.
>>>
>>
>> Done.
>>
>
> Thanks!
>
> I think it would help a lot to have a bit more code & context when
> evaluating this, even as a very early step.
>
> My concrete suggestion would be this: implement a bozo-formatter. A
> completely trivial to implement, BS set of rules that only requires lexing.
> Then we can see the interface, a trivial implementation, and some test
> cases that use it. I think that'll make this much easier to review and
> really analyze. It also makes it much closer to something that could get
> checked in as a stepping stone.
>

Yes, having a simple step 1 implementation is the planned next step. I
wanted to gather some initial higher level feedback first, so we can still
throw it completely out if there's anything obviously wrong without wasting
time.

>
>
>> Questions:
>>>> - please point out anything that might make this interface not viable
>>>> for your own use cases
>>>> - are we missing the point somewhere?
>>>> - is Tooling/ the right place for this?
>>>>
>>>
> I think Tooling is the right place because I think essentially every tool
> anyone writes will want to have *some* formatting applied to it. Leveraging
> this from an existing tool and composing cleanly w/ it will almost
> certainly be important.
>
> A few specific comments that jump out at me:
>
> +//  Various functions to configurably format source code.
> +//
> +//  This supports two use cases:
> +//  - Format (ranges in) a given file.
> +//  - Reformat sources after automated refactoring.
>
> I don't (yet) see how this is supported. I suspect my example request will
> fix...
>
> +//
> +//  Formatting is done on a per file basis.
> +//
> +//  The formatting strategy is:
> +//  - lex the file content
> +//  - in case formatting decisions need information of the AST analysis,
> run
> +//    the analysis and annotate all tokens that are relevant for
> formatting
>  +//    in the current file
>
> How does this interact with the just-refactored use-case? Ideally, I would
> like to have the option of using the existing AST (when the refactor didn't
> change it in interesting ways, f.ex., it just renamed bits), or have it
> re-compute one.
>

So, 1. the proposed AnalyzedSource interface completely decouples the
formatting from the AST invalidation / recreation logic, which allows us to
come up with good strategies for this later.

How feasible is it to figure out whether source file changes would affect
the AST?

> I worry about requiring recomputing the AST after a refactor -- I may need
> to finish applying the refactor across multiple files before things start
> to compile again... Maybe the existing rewriter / refactoring APIs handle
> this though.
>

That is a good point.

> +//  - reformat based on the (annotated) tokens
>
> Where are these tokens annotated? I don't see any real place for that in
> the interface below.
>

This is an implementation detail. I plan to pull that part of the comment
into the .cpp file once we have it, but I think some of this is necessary
to put in here, so the design decisions (like the AnalyzedSource) make some
sense.

> +//
> +//  To allow different strategies of handling the SourceManager the
> formatter
> +//  works on an interface AnalyzableSource, which provides access to the
> file
> +//  contents and, if available, allow to lazily run an analysis over the
> AST
> +//  when needed.
>
> Should this be what provides access to the token stream?
>

Well, it provides access to the source, and thus the token stream.

+/// \brief A character range of source code.
> +struct CodeRange {
>
> Why doesn't CharSourceRange work? This seems an odd entity to have stashed
> away inside of this library.
>

We want something that is not coupled to a SourceManager here.

Why doesn't a pair of iterators into the buffer work?
>

That would work. I don't think it'd be a big difference, but if you really
think it's nicer that's fine with me.

> Hmm, below I see this is almost a user-input... Do you really want a pair
> of line+column coordinates?
>

I'm not sure. We had that argument on the design doc, too. My gut feeling
is that we have more tool-provided input here than user provided input, and
I'd rather convert user input early and have a common straight forward
representation in the code.

> +/// \brief Set of formatting options.
> +struct FormatOptions {
>
> Perhaps I'm overly pessimistic, but I strongly expect to want most
> formatters to have their own custom *logic*, not just a set of options.
>
> I had imagined more of a visitor system for formatting where there were
> methods that could be overridden for each formatting "event". These methods
> would be given access to the location, state from previous events, and be
> able to return information about line breaks, indentation, whatever was
> relevant for that event. They could also query for the current AST node,
> walk around the AST, navigate, etc.
>
> These would then be supplemented by helper logic to handle common patterns
> and idioms. Those would form the basis of code reuse between formatting
> systems.
>
> I'm fine not going completely crazy right off the bat, but I wonder
> whether it's worth trying to begin the design of a visitor-like logic-based
> interface for performing a particular format now... even if we end up with
> only one implementation and an options struct for a long time. (Honestly, I
> would expect all of them to take an options struct to supplement the logic
> anyways...)
>

Yea, I think those are really 2 different concerns:
- provide different strategies
- provide different configurations / strategy

I think your point is generally valid, but I think for this we really need
some code first...

Cheers,
/Manuel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20120606/3f06a8de/attachment.html>