[llvm-dev] [cfe-dev] [RFC] Adding support for clang-format making further code modifying changes

Sat Aug 28 06:51:52 PDT 2021

On Sat, Aug 28, 2021 at 8:52 AM Aaron Ballman via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> On Sat, Aug 28, 2021 at 3:48 AM David Blaikie <dblaikie at gmail.com> wrote:
> >
> > +1 to what Manuel's said here.
> >
> > One slight change I'd suggest is changing the term "breaking changes" to
> "non-whitespace changes", perhaps? (they aren't necessarily breaking
> anything) At least I assume that's the intent, but I might be wrong in
> which case I'd love to better understand what's being proposed.
>
> To me, the crux of my concern isn't nonwhitespace changes, but changes
> that can make code which used to compile no longer do so. It just so
> happens that nonwhitespace changes are where that risk is highest

Perhaps it would be correct to say that the problematic formatters are
those that change the file's *sequence of preprocessing tokens*.  This is
particularly relevant to clang-format because clang-format doesn't actually
parse C++. So for example you might imagine a formatter that cuddles angle
brackets:
    std::vector<std::vector<int> > v;  // BEFORE
    std::vector<std::vector<int>> v;  // AFTER
This changes the token sequence, so it's potentially dangerous. Because
clang-format doesn't parse, such a formatter can't tell the difference
between that (safe, post-C++03) edit and this (unsafe) edit:
    X<&Y::operator> >();  // BEFORE
    X<&Y::operator>>();  // AFTER: syntax error

Obviously such a formatter is still going to be relatively safe in
practice. But because it (has the potential to) change the token sequence,
it is *qualitatively more dangerous* than a formatter that merely reformats
the existing token sequence.

Shuffling around the tokens (e.g. changing west-const into east-const) is
just a special case of changing the token sequence.

In particular, if you change the token sequence *when you're inside a
preprocessor macro*, then (because clang-format doesn't parse C++) you
really have no idea what effect your change is going to have.
    #define X(V) int V = 42
    int main() { X(v1); X(const v2); }
Here, editing `const v2` into `v2 const` produces a syntax error.

Now, for *any* formatter, one can find pathological programs that are
broken by it; e.g.
    template<int X> void F() requires (X==2) {}
    int main() { F<__LINE__>(); }
will stop compiling if you add linebreaks to it. I don't think this quite
reduces my thesis to absurdity, but I admit it's theoretically awkward.
But if you restrict your edits to those that preserve the token sequence,
then I *think* you'll only ever break programs that use either `#X`
(stringifying) or `__LINE__`. Anyone care to produce a counterexample? :)

my $.02,
Arthur
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210828/46727aaf/attachment.html>