[cfe-dev] Improvements to AST Matcher tooling and auto usage

Sat Oct 19 15:09:53 PDT 2019

Hi,

I presented some features at EuroLLVM in April this year relating to 
improvements I made to AST Matcher tooling (clang-tidy and clang-query):

  https://steveire.wordpress.com/2019/04/30/the-future-of-ast-matching

My aims were:

* make learning to use AST Matchers easier
* make the clang-query tool itself help the user more
* simplify the AST Matchers required by skipping non-visible nodes in 
matchers and AST dumps (such as implicit conversions, but see the talk 
for more).

I think the javascript API I demo'd for writing clang-tidy checks could 
be a particularly valuable addition to the tools ecosystem.

I didn't manage to get all of the patches upstream because arguments 
about using auto in the same way it is used in other parts of LLVM 
demotivated me. I haven't found an alternative reviewer to help get the 
patches upstream, though my attendance at EuroLLVM was partially an 
attempt to find one.

Some people are still interested in the features getting upstream, but 
there is still some work for me to do in getting some of the patches 
from presentation-quality to commit-quality. So there is still 
motivation needed for my part. Beyond that, I have many other features I 
wish to contribute to clang tooling, so I don't really want to start 
again if I know I can't continue with it.

The rule being applied in the reviews for those patches is approximately 
"if auto is used on the left, the type must appear on the right in angle 
brackets (such as in a cast)".

I started a thread on this mailing list in November 2018 attempting to 
get the coding standard clarified regarding what is allowed and what is 
not regarding `auto`. There was no outcome from the thread. However, I 
included some crude measurements of the existing use of auto at the time 
using git grep:

http://llvm.1065342.n5.nabble.com/llvm-dev-RFC-Modernizing-our-use-of-auto-td123947.html#a124564

Here is a plot of the use of `auto` where there are no angle brackets on 
the right side of the equals sign, measured each week over the last 
three years:

  https://imgur.com/a/kcHaEhk

It was made with this script:

  https://gist.github.com/steveire/c9633d243d5d3b515e3eb26aaa3ad7c8

It is quite conservative as it only counts use of auto in the lib 
directories of llvm, clang etc, excluding tests, include/ etc.

It shows that occurances of auto - in ways that my patches are rejected 
for - has almost doubled in the last 3 years.

I think it shows that the standard that "patches must use the returned 
type on the right side if using auto on the left" is not being followed 
in the code. However, that is how the standard is being used by some 
reviewers.

If some of the major increases (and the decrease) intrigue you, the 
following patches are the responsible ones in the llvm-project repo):

Significant increases:
     6151654c00cfcdb90fd665403948aab64b83494a
     d169d70bbf053f636aed25b482ec63efd094a95b
Reductions:
     ddf3db9b5eddc6c7cc75fd30eea545c22ad70dcd
     a4fa0b880a63dad30d9ad1ff9fe8c2358c32dd77
     e372710d30c66e020576a2af6e1e5e815815a65a

The MLIR team announced that they are following the LLVM coding guidelines:

  http://lists.llvm.org/pipermail/llvm-dev/2019-September/134992.html

but they use auto extensively in ways that some reviewers don't accept:

  https://github.com/tensorflow/mlir/blob/master/lib/Parser/Parser.cpp

So, clearly other interpretations of the coding standard are possible. 
If my reviewer would not accept the MLIR code, then someone is 
misinterpreting the coding guideline.

I think it would make sense for

* The coding guideline
* The code itself
* The reviews

to agree somewhat on what the coding guideline is regarding auto (and 
preferably have it enforced somehow). Currently, the reviews appear to 
be out of sync.

If the rule should be "auto must not be used if the type is not on the 
right side", I'm sure I can write a clang-tidy check to convert most of 
the code. Such a tool would also enforce that new code follows the same 
rule.

That would make it possible for me to finish the AST Matcher features 
and get them upstream. I don't mind not using auto, but I do wish for 
the coding guideline, the code itself and the reviews to agree on what 
the guideline is and for everyone (contributors and reviewers) to follow 
the same rules. That is not currently the case.

An alternative (which I favor) is to change the coding standard along 
the lines of "use of auto is not encouraged, but it is not in itself a 
reason to reject the contribution".

I would very much like to finish and upstream the AST Matcher improvements.

Does anyone share the sentiment?

Thanks,

Stephen