[cfe-dev] Design: clang-format

Tue May 15 00:50:06 PDT 2012

On Tue, May 15, 2012 at 6:02 AM, Douglas Gregor <dgregor at apple.com> wrote:

> Hi Manuel,
>
> On May 11, 2012, at 9:27 AM, Manuel Klimek wrote:
>
> Hi,
>
> we're working on the design of clang-format; there are quite a few open
> questions, but I'd rather get some early feedback to see whether we're
> completely off track somewhere.
>
>
> Great!
>
> The main doc is on google docs:
>
> https://docs.google.com/document/d/1gpckL2U_6QuU9YW2L1ABsc4Fcogn5UngKk7fE5dDOoA/edit
>
> For those of you who prefer good old email, here is a copy of the current
> state. Feel free to add comments in either.
> *Design: clang-format This document contains a design proposal for a
> clang-format tool, which allows C++ developers to automatically format
> their code independently of their development environment.
> ContextWhile many other languages have auto-formatters available, C++ is
> still lacking a tool that fits the needs of the majority of C++
> programmers. Note that when we talk about formatting as part of this
> document, we mean both the problem of indentation (which has been largely
> solved independently by regexp-based implementations in editors / IDEs) and line
> breaking, which proves to be a harder problem.
>
> There are multiple challenges to formatting C++ code:
>
>    - a vast number of different coding styles has evolved over time
>    - many projects value consistency over conformance and dislike
>    style-only changes, thus making it important to be able to work with code
>    that is not written according to the most current style guide
>    - macros need to be handled properly
>    - it should be possible to format code that is not yet syntactically
>    correct
>
> Goals
>
>    - Format a whole file according to a configuration
>    - Format a part of a file according to a configuration
>    - Format a part of a file while being consistent as best as possible
>    with the rest of the file, while falling back to a configuration for
>    options that cannot be deduced from the current file
>    - Integrating with editors so that you can just type away until you’re
>    far past the column limit, and then hit a key and have the editor layout
>    the code for you, including placing the right line breaks
>
> Non-goals
>
>    - Indenting code while you type; this is a much simpler problem, but
>    has even stronger performance requirements - the current editors should be
>    good enough, and we’ll allow new workflows that don’t ever require the user
>    to break lines
>    - The only lexical elements clang-format should touch are:
>    whitespaces, string-literals and comments. Any other changes ranging from
>    ordering includes to removing superfluous paranthesis are not in the scope
>    of this tool.
>    - Per-file configuration: be able to annotate a file with a style
>    which it adheres to (?)
>
> Code locationClang-format is a very basic tool, so it might warrant
> living in clang mainstream. On the other hand it would also fit nicely with
> other clang refactoring tools. TODO: Where do we want clang-format to
> live?
> *
>
>
> I think clang-format should live with the refactoring tools, wherever they
> go. However, refactoring is going to be a crucial technology for Clang
> going forward, which almost certainly means that it should migrate into
> mainline Clang at some point.
>
> * Parsing approachThe key consideration is whether clang-format can be
> based purely on a lexer, or whether it needs type information, and we
> need the full AST.
>
> We believe that we will need the full AST information to correctly indent
> code, break lines, and fix whitespace within a line.
>
> Examples:
>
> AST-dependent indentation:
> callFunction(foo<something,
>                 ^ line up here, if foo is a template name
>              ^ line up here otherwise
>
> AST-dependent line breaking:
> Detecting that ‘*’ is an binary operator in this case requires parsing; if
> it is a binary operator, we want to line-break after it, if it is a unary
> operator, we want to prevent line breaking
>
> result = variable1 * variable2;
>
> AST-dependent whitespace inside lines:
> a * b;
>   ^ Binary operator or pointer declaration?
> a & f();
>   ^ Binary operator or function declaration?
> *
>
>
> I wonder how well we can do simply with the lexer and preprocessor.
> Introducing the AST traversal adds a lot of complication, but you're right
> that it's necessary to do a great job.
>
> *Challenge: Preprocessor
> Not every line in a program is covered by the AST - for example, there are
> unused macro definitions, various preprocessor directives, #ifdef’ed out
> code, etc.
>
> We will at least need some form of lexing approach for the parts of a
> source file that cannot be correctly indented / line broken by looking at
> the AST.
>
> Algorithm Visit all nodes on the AST; for each node that is part of a
> macro expansion, consider all locations taking part in that macro
> expansion. If the location is within the range that need to be indented,
> look at the code at the location, the rules around the node, and adjust
> whitespace as necessary. If the node starts a line, adjust the indent; if a
> node overflows the line, break the line. TODO: figure out what to do with
> the lines that are not visited that way.
> ConfigurationTo support a majority of developers, being able to configure
> the desired style is key. We propose using a YAML configuration file, as
> there’s already a YAML parser readily available in LLVM. Proposals for more
> specific ideas welcome.
> *
>
>
> Seems reasonable.
>
> * Style deductionWhen changing the format of code that does not conform
> to a given style configuration, we will optionally try to deduce style
> options from the file first, and fall back to the configured layout when
> there was no clear style deducible from the context.
> TODO: Detailed design ideas.
> *
>
>
> Yes, please!
>
> *Interface This is a strawman. Please shoot down.
>
> Command line interface:
> Command line interfaces allow easy integration with existing tools and
> editors.
>
> USAGE: clang-format <build-path> <source> [<column0> <line0> <length0>
> [...]] [-- list of command line arguments to the parser]
>
> <columnN> <lineN> <lengthN>: Specifies a code range to be reformatted; if
> no code range is given, assume the whole file.
> *
>
>
> For an editor to use this functionality efficiently, we'll want it to go
> into a shared library (e.g., libclang).
>

I added a entry under "goals" for this.

I think that if we want this to go into libclang, it might make sense to
develop it in mainline as a library (or as part of the refactoring library)
from the start.

Thoughts?
/Manuel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20120515/84838b2a/attachment.html>