Hi,<br><br>we're working on the design of clang-format; there are quite a few open questions, but I'd rather get some early feedback to see whether we're completely off track somewhere.<br><br>The main doc is on google docs:<br>
<a href="https://docs.google.com/document/d/1gpckL2U_6QuU9YW2L1ABsc4Fcogn5UngKk7fE5dDOoA/edit" target="_blank">https://docs.google.com/document/d/1gpckL2U_6QuU9YW2L1ABsc4Fcogn5UngKk7fE5dDOoA/edit</a><br><br>For those of you who prefer good old email, here is a copy of the current state. Feel free to add comments in either.<div>
<b style="font-family:'Times New Roman';font-size:medium"><h1 dir="ltr"><span style="font-size:24px;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">Design: clang-format</span></h1>
<span style="font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline;white-space:pre-wrap">This document contains a design proposal for a clang-format tool, which allows C++ developers to automatically format their code independently of their development environment.</span><br>
<h2 dir="ltr"><span style="font-size:19px;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">Context</span></h2><span style="font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline;white-space:pre-wrap">While many other languages have auto-formatters available, C++ is still lacking a tool that fits the needs of the majority of C++ programmers. Note that when we talk about </span><span style="font-size:15px;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">formatting</span><span style="font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline;white-space:pre-wrap"> as part of this document, we mean both the problem of </span><span style="font-size:15px;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">indentation</span><span style="font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline;white-space:pre-wrap"> (which has been largely solved independently by regexp-based implementations in editors / IDEs) and </span><span style="font-size:15px;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">line breaking</span><span style="font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline;white-space:pre-wrap">, which proves to be a harder problem.</span><br>
<span style="font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline;white-space:pre-wrap"></span><br><span style="font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline;white-space:pre-wrap">There are multiple challenges to formatting C++ code:</span><ul style="margin-top:0pt;margin-bottom:0pt">
<li style="list-style-type:disc;font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">a vast number of different coding styles has evolved over time</span></li>
<li style="list-style-type:disc;font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">many projects value consistency over conformance and dislike style-only changes, thus making it important to be able to work with code that is not written according to the most current style guide</span></li>
<li style="list-style-type:disc;font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">macros need to be handled properly</span></li><li style="list-style-type:disc;font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline">
<span style="vertical-align:baseline;white-space:pre-wrap">it should be possible to format code that is not yet syntactically correct</span></li></ul><h2 dir="ltr"><span style="font-size:19px;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">Goals</span></h2>
<ul style="margin-top:0pt;margin-bottom:0pt"><li style="list-style-type:disc;font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">Format a whole file according to a configuration</span></li>
<li style="list-style-type:disc;font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">Format a part of a file according to a configuration</span></li>
<li style="list-style-type:disc;font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">Format a part of a file while being consistent as best as possible with the rest of the file, while falling back to a configuration for options that cannot be deduced from the current file</span></li>
<li style="list-style-type:disc;font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">Integrating with editors so that you can just type away until you’re far past the column limit, and then hit a key and have the editor layout the code for you, including placing the right line breaks</span></li>
</ul><h2 dir="ltr"><span style="font-size:19px;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">Non-goals</span></h2><ul style="margin-top:0pt;margin-bottom:0pt"><li style="list-style-type:disc;font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline">
<span style="vertical-align:baseline;white-space:pre-wrap">Indenting code while you type; this is a much simpler problem, but has even stronger performance requirements - the current editors should be good enough, and we’ll allow new workflows that don’t ever require the user to break lines</span></li>
<li style="list-style-type:disc;font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">The only lexical elements clang-format should touch are: whitespaces, string-literals and comments. Any other changes ranging from ordering includes to removing superfluous paranthesis are not in the scope of this tool.</span></li>
<li style="list-style-type:disc;font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">Per-file configuration: be able to annotate a file with a style which it adheres to (?)</span></li>
</ul><h2 dir="ltr"><span style="font-size:19px;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">Code location</span></h2><span style="font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline;white-space:pre-wrap">Clang-format is a very basic tool, so it might warrant living in clang mainstream. On the other hand it would also fit nicely with other clang refactoring tools. </span><span style="font-size:15px;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">TODO: Where do we want clang-format to live?</span><br>
<h2 dir="ltr"><span style="font-size:19px;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">Parsing approach</span></h2><span style="font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline;white-space:pre-wrap">The key consideration is whether clang-format can be based </span><span style="font-size:15px;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">purely on a lexer</span><span style="font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline;white-space:pre-wrap">, or whether it needs type information, and we need the full </span><span style="font-size:15px;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">AST</span><span style="font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline;white-space:pre-wrap">. </span><br>
<span style="font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline;white-space:pre-wrap"></span><br><span style="font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline;white-space:pre-wrap">We believe that we will need the full AST information to correctly indent code, break lines, and fix whitespace within a line.</span><br>
<span style="font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline;white-space:pre-wrap"></span><br><span style="font-size:15px;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">Examples:</span><br>
<span style="font-size:15px;font-family:Arial;vertical-align:baseline;white-space:pre-wrap"></span><br><span style="font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline;white-space:pre-wrap">AST-dependent indentation:</span><span style="font-size:15px;font-family:'Courier New';font-weight:normal;vertical-align:baseline;white-space:pre-wrap"></span><br>
<span style="font-size:15px;font-family:'Courier New';font-weight:normal;vertical-align:baseline;white-space:pre-wrap">callFunction(foo<something,</span><br><span style="font-size:15px;font-family:'Courier New';font-weight:normal;vertical-align:baseline;white-space:pre-wrap"> ^ line up here, if foo is a template name</span><br>
<span style="font-size:15px;font-family:'Courier New';font-weight:normal;vertical-align:baseline;white-space:pre-wrap"> ^ line up here otherwise</span><br><span style="font-size:15px;font-family:'Courier New';font-weight:normal;vertical-align:baseline;white-space:pre-wrap"></span><br>
<span style="font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline;white-space:pre-wrap">AST-dependent line breaking:</span><br><span style="font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline;white-space:pre-wrap">Detecting that ‘*’ is an binary operator in this case requires parsing; if it is a binary operator, we want to line-break after it, if it is a unary operator, we want to prevent line breaking</span><br>
<span style="font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline;white-space:pre-wrap"></span><br><span style="font-size:15px;font-family:'Courier New';font-weight:normal;vertical-align:baseline;white-space:pre-wrap">result = variable1 * variable2;</span><br>
<span style="font-size:15px;font-family:'Courier New';font-weight:normal;vertical-align:baseline;white-space:pre-wrap"></span><br><span style="font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline;white-space:pre-wrap">AST-dependent whitespace inside lines:</span><br>
<span style="font-size:15px;font-family:'Courier New';font-weight:normal;vertical-align:baseline;white-space:pre-wrap">a * b;</span><br><span style="font-size:15px;font-family:'Courier New';font-weight:normal;vertical-align:baseline;white-space:pre-wrap"> ^ Binary operator or pointer declaration?</span><br>
<span style="font-size:15px;font-family:'Courier New';font-weight:normal;vertical-align:baseline;white-space:pre-wrap">a & f();</span><br><span style="font-size:15px;font-family:'Courier New';font-weight:normal;vertical-align:baseline;white-space:pre-wrap"> ^ Binary operator or function declaration?</span><br>
<span style="font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline;white-space:pre-wrap"></span><br><span style="font-size:15px;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">Challenge: Preprocessor</span><br>
<span style="font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline;white-space:pre-wrap">Not every line in a program is covered by the AST - for example, there are unused macro definitions, various preprocessor directives, #ifdef’ed out code, etc.</span><br>
<span style="font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline;white-space:pre-wrap"></span><br><span style="font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline;white-space:pre-wrap">We will at least need some form of lexing approach for the parts of a source file that cannot be correctly indented / line broken by looking at the AST.</span><br>
<span style="font-size:19px;font-family:Arial;vertical-align:baseline;white-space:pre-wrap"></span><br><h2 dir="ltr"><span style="font-size:19px;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">Algorithm</span></h2>
<span style="font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline;white-space:pre-wrap">Visit all nodes on the AST; for each node that is part of a macro expansion, consider all locations taking part in that macro expansion. If the location is within the range that need to be indented, look at the code at the location, the rules around the node, and adjust whitespace as necessary. If the node starts a line, adjust the indent; if a node overflows the line, break the line. TODO: figure out what to do with the lines that are not visited that way.</span><br>
<h2 dir="ltr"><span style="font-size:19px;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">Configuration</span></h2><span style="font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline;white-space:pre-wrap">To support a majority of developers, being able to configure the desired style is key. We propose using a YAML configuration file, as there’s already a YAML parser readily available in LLVM. Proposals for more specific ideas welcome.</span><br>
<h2 dir="ltr"><span style="font-size:19px;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">Style deduction</span></h2><span style="font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline;white-space:pre-wrap">When changing the format of code that does not conform to a given style configuration, we will optionally try to deduce style options from the file first, and fall back to the configured layout when there was no clear style deducible from the context.</span><br>
<span style="font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline;white-space:pre-wrap">TODO: Detailed design ideas.</span><br><h2 dir="ltr"><span style="font-size:19px;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">Interface</span></h2>
<span style="font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline;white-space:pre-wrap">This is a strawman. Please shoot down.</span><br><span style="font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline;white-space:pre-wrap"></span><br>
<span style="font-size:15px;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">Command line interface:</span><span style="font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline;white-space:pre-wrap"></span><br>
<span style="font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline;white-space:pre-wrap">Command line interfaces allow easy integration with existing tools and editors.</span><br><span style="font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline;white-space:pre-wrap"></span><br>
<span style="font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline;white-space:pre-wrap">USAGE: clang-format <build-path> <source> [<column0> <line0> <length0> [...]] [-- list of command line arguments to the parser]</span><br>
<span style="font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline;white-space:pre-wrap"></span><br><span style="font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline;white-space:pre-wrap"><columnN> <lineN> <lengthN>: Specifies a code range to be reformatted; if no code range is given, assume the whole file.</span><br>
<span style="font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline;white-space:pre-wrap"></span><br><span style="font-size:15px;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">Code level interface:</span><br>
<span style="font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline;white-space:pre-wrap">Reformatting source code is also a prerequisite for automated refactoring tools. We want to be able to integrate the reformatting as a post-processing step on top of other code transformations to make sure as little human intervention is needed as possible.</span><br>
<h2 dir="ltr"><span style="font-size:19px;font-family:Arial;vertical-align:baseline;white-space:pre-wrap">Competition</span></h2><span style="font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline;white-space:pre-wrap">TODO: List other formatting tools we’re aware of and how well they work</span><ul style="margin-top:0pt;margin-bottom:0pt">
<li style="list-style-type:disc;font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">GNU ident - C only;</span></li><li style="list-style-type:disc;font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline">
<span style="vertical-align:baseline;white-space:pre-wrap">BCPP (</span><a href="http://invisible-island.net/bcpp/bcpp.html" target="_blank"><span style="color:rgb(17,85,204);vertical-align:baseline;white-space:pre-wrap">http://invisible-island.net/bcpp/bcpp.html</span></a><span style="vertical-align:baseline;white-space:pre-wrap">) - “it does (by design) not attempt to wrap long statements”; written in about 1995, since then had very few changes;</span></li>
<li style="list-style-type:disc;font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">Artistic Style (</span><a href="http://astyle.sourceforge.net/" target="_blank"><span style="color:rgb(17,85,204);vertical-align:baseline;white-space:pre-wrap">http://astyle.sourceforge.net/</span></a><span style="vertical-align:baseline;white-space:pre-wrap">) - one of the most frequently used, but “not perfect”;</span></li>
<li style="list-style-type:disc;font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">Uncrustify (</span><a href="http://uncrustify.sourceforge.net/" target="_blank"><span style="color:rgb(17,85,204);vertical-align:baseline;white-space:pre-wrap">http://uncrustify.sourceforge.net/</span></a><span style="vertical-align:baseline;white-space:pre-wrap">) - has lots of configuration options;</span></li>
<li style="list-style-type:disc;font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">GreatCode (</span><a href="http://sourceforge.net/projects/gcgreatcode/" target="_blank"><span style="color:rgb(17,85,204);vertical-align:baseline;white-space:pre-wrap">http://sourceforge.net/projects/gcgreatcode/</span></a><span style="vertical-align:baseline;white-space:pre-wrap">) - not supported since 2005;</span></li>
<li style="list-style-type:disc;font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline"><span style="vertical-align:baseline;white-space:pre-wrap">Style Revisor (</span><a href="http://style-revisor.com/" target="_blank"><span style="color:rgb(17,85,204);vertical-align:baseline;white-space:pre-wrap">http://style-revisor.com</span></a><span style="vertical-align:baseline;white-space:pre-wrap">) - commercial; claims to understand C++, but it isn’t released yet, so no way to try; uses code snippets to specify rules.</span></li>
</ul><span style="font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline;white-space:pre-wrap"></span><br><span style="font-size:15px;font-family:Arial;font-weight:normal;vertical-align:baseline;white-space:pre-wrap">All of them except Style Revisitor seem to have simplistic regexp-based c++ parsing.</span></b></div>