[llvm-dev] [RFC] Adding support for clang-format making further code modifying changes

MyDeveloper Day via llvm-dev llvm-dev at lists.llvm.org
Mon Aug 9 12:36:01 PDT 2021


Hi all,

As a frequent user and maintainer of clang-format I would like to formalize
a change to the "charter" of what clang-format can/could/should do

Motivation
==========

As you all know clang-format parses source code and can manipulate the
tokens to produce consistently formatted code (most of the time), its
become ubiquitous in the industry and its integration into
popular editors & IDEs such as vim/visual studio/code mean it very simple
for users to get setup and running producing good looking code.

clang-format does not use semantic information, and as such it doesn't need
includes, defines or compiler flags to interpret code. Clang-format is
generally guessing that certain sequences of tokens from the lexer
represent certain patterns. It's a good guess but it gets it wrong from
time to time, hence trading correctness for performance.

Because of this clang-format is fast (not maybe as fast as we'd like) but
fast enough to be part of in a "save" operation such that the code is
formatted as the ide saves your work.

Mostly clang-format manipulates only whitespace, but over the years there
have been a number of extremely useful features that have broken this rule,
namely include sorting, namespace commenting to name a few.

The usage scenario of clang-format has also changed slightly to include a
non modifying advisory role identifying clang-format violations (as in our
own llvm-premerge checks), which can greatly aid the code review process by
removing the need to constantly ask for the code to be formatted correctly
or follow the LLVM convention.

Recently a number of new features have been proposed that would also alter
code, insertion of braces, east/west const conversion that can be performed
at "save" speeds.

As no semantic information is present, this raises the question as to
whether clang-format could actually break your code.
This has actually always been the case especially since the introduction of
include sorting, but also we all know that clang-format can break your code
from the visual perspective too and hence the need for // clang-format
off/on

In the most part include sorting not only might break your code noisily
such that it won't compile, but it can also break it silently,
and because IncludeSorting is on by default this breakage could potentially
go unnoticed.

https://stackoverflow.com/questions/37927553/can-clang-format-break-my-code
https://travisdowns.github.io/blog/2019/11/19/toupper.html

I don't think it can be in any doubt that IncludeSorting is a nice feature
used by 100,000's of developers without too many issues, but there is some
suggestion that its inclusion as "on by default" was potentially a mistake.

Proposals for other new features that modify the code in a similar way are
getting some push back for changing the "charter" of clang-format when it
could be considered to have already changed.
This is causing friction on the review of some features and so it was
suggested to present an RFC to gain wider consensus on the concept of
clang-format changing code.

Mostly when a code modifying change is submitted the view is that this
isn't clang-formats job but more clang-tidy, however clang-tidy requires
full access to all include files and compiler definitions and only works on
the preprocessor paths of the code you are analyzing for and its speed and
hence its frequency of use is drastically reduced.

Some clang-format based modifications can in theory be made with a
relatively high level of confidence without paying the price and the
configuration complexity of getting all the
semantic information. https://reviews.llvm.org/D105701.
There is potentially for clang-format to introduce breaking changes and
whilst this could in theory cause noisy breakages they could also in theory
produce silent ones.

These new features want to be run at "reformat" speeds & frequency and
benefit from the rich Ecosystem of inclusion and integration in IDEs and
editors that clang-format enjoys.

This RFC is to try to gain some consensus as to what clang-format can do
and what the conditions/constraints should be if allowed to do so.

Benefits
========

The benefits are that clang-format can be used to further make code
modifications to adhere to a coding convention (insertion/removal of
braces),
clang-format could be used to validate and correct coding convention
(left/right const),  and could be used to remove unnecessary semicolons or
potentially convert code to trailing return types all of which could be
performed at "reformat" speeds.

Whilst some of these capabilities are available in clang-tidy, it requires
significant infrastructure to often perform these often relatively simple
operations and it's unlikely
that all users of clang-format are set up to perform these actions in
clang-tidy.

There are likely a number of clang-tidy modifications that could in theory
be made at "reformat" speeds with such an approach. But there really needs
some agreement that it's OK for clang-format to modify the code.

Allowing these kinds of modification capabilities could lead to a new set
of "Resharper" style capabilities being added to clang-format,
capable of bringing source code quickly into line with coding conventions.

Concerns
========

Correctness is King, the concern is your formatting tool should not perform
operations that could break your code. (this is already the case)

It's perhaps not clang-format's job to do this.

I should personally declare myself as being in favor of allowing
clang-format to modify code, I think it only fair that I let others reply
to the RFC with their own concerns.

Constraints
===========

To minimize the impact to existing users, We suggest that a number of
constraints be generally considered good practice when submitting reviews
for clang-format with modifying changes

1) Code Modifying Features should always be off by default
The user should have to make a positive affirmative action to use such a
feature

2) Code Modifying Features configuration options should be highlighted as
such in the ClangFormatStyleOptions.rst such that its clear these are
potentially code breaking options

3) Existing "Code Modifying Features" do not have to adhere to 1) but the
documentation should be updated to adhere to 2)

4) Code Modifying Features should be conservative to be "correct first"
before being "complete".
i.e. If it's possible a change could be ambiguous it should tend towards
not making the incorrect change at all rather than forcing an incorrect
change. (This could cause some
    cases to be missed)

Request
=======

I would like to get some general agreement that it's OK for future reviews
of code modification changes to become part of clang-format (as they are in
IncludeSorting) assuming the best practices are
followed to protect users from unwanted changes.

Feedback would be appreciated

MyDeveloperDay
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210809/f3516b58/attachment.html>


More information about the llvm-dev mailing list