[PATCH] D141508: [utils][filecheck-lint] Add filecheck-lint, a linter for FileCheck directives.
Sam McCall via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Jan 11 10:44:50 PST 2023
sammccall added a comment.
Sorry, hit enter too soon on previous round.
Generally this looks really useful, and finds real bugs!
Most of the comments below are of the form:
- the implementation seems overly complex and general, I think being short/simple/direct is more useful for maintenance than being "good python code", as this is not production code and will be read/edited by people who don't know much python
- for my taste there's too much boilerplate comments repeating the names of things, and not enough explaining how this piece of code is contributing to solving the probelm
================
Comment at: llvm/utils/filecheck_lint/filecheck_lint.py:18
+ parser = argparse.ArgumentParser(description=__doc__)
+ parser.add_argument(
+ '--distance-threshold',
----------------
this isn't used anywhere - if there isn't a concrete need I'd suggest just making it a constant, then you can drop argparse and just iterate over argv?
(If you want to use a different config out of tree that's not unreasonable, but it's not clear to me why that would be the case)
================
Comment at: llvm/utils/filecheck_lint/filecheck_lint/filecheck_lint.py:8
+# ===----------------------------------------------------------------------===##
+"""A linter that detects potential typos in FileCheck directive names."""
+
----------------
there are a lot of fine-grained comments here about the implementation (e.g. what a class represents) but it's hard to get an overall picture of what this library does and how.
I think an example would go a long way here, somethng like:
```
Consider a broken test foo.cpp:
// RUN: clang -cc1 -ast-dump %s | FileCheck %s --check-prefix=NEW
// RUN: clang -cc1 -ast-dump %s -std=c++98 | FileCheck %s --check-prefix=OLD
auto x = 42;
// NEWW: auto is a c++11 extension
// ODL-NOT: auto is a c++11 extension
First we detect FileCheck directive prefixes by looking for --check-prefix flags.
Here we get {CHECK, NEW, OLD}, so our directive names are
{CHECK, NEW, OLD, CHECK-NOT, NEW-NOT, ...}.
Then we look for lines that look like directives: these are of the form FOO: at the beginning of a line or a comment.
If any of these are a "near-miss" for a directive name, then we suspect this is a typo.
```
================
Comment at: llvm/utils/filecheck_lint/filecheck_lint/filecheck_lint.py:39
+
+ Args:
+ s1: a string
----------------
If there's nothing non-obvious to say about an arg (the type hints already say they're strings), don't say anything? Similarly the "Returns:" just echoes the first line.
(I'm not a python person, maybe there's some structural reason this comment needs to be this way, but other files seem mostly not to have them)
================
Comment at: llvm/utils/filecheck_lint/filecheck_lint/filecheck_lint.py:62
+class FilePos:
+ """Stores the coordinates of a span on a single line within a file.
+
----------------
nit: FileRange? (both avoiding the abbreviation & the impression that it describes a point only)
================
Comment at: llvm/utils/filecheck_lint/filecheck_lint/filecheck_lint.py:83
+class Diagnostic:
+ """Stores information about typos and emit error diagnostics.
+
----------------
"emit" doesn't belong here I think.
Maybe more precisely: "Stores information about one typo and possibly its fix" (is the fix optional?)
I'm not sure the second paragraph or the attributes adds anything nonobvious (Findings isn't a concept that makes sense here)
================
Comment at: llvm/utils/filecheck_lint/filecheck_lint/filecheck_lint.py:113
+ return (
+ f'Found potentially misspelt directive "{self.typo}". Did you mean '
+ f'"{self.fix}"?'
----------------
nit: US english is much more common in LLVM, so probably "misspelled"
================
Comment at: llvm/utils/filecheck_lint/filecheck_lint/filecheck_lint.py:124
+ Finds all the strings that could be potential directives in a sequence of
+ strings. This assumes that lines containing the directive always start with
+ either whitespace, or the specified comment prefix. What constitutes a
----------------
We should also support the directive being at the start of a line-comment, even if the line is otherwise non-empty.
`rg "\s*[^ ]\s*// CHECK" clang/test` shows many examples e.g:
```
../clang/test/Refactor/LocalRename/NoSymbolSelectedError.cpp
4:class Baz { // CHECK: [[@LINE]]:1: error: there is no symbol at the given location
```
================
Comment at: llvm/utils/filecheck_lint/filecheck_lint/filecheck_lint.py:131
+ lines: a sequence of strings
+ directive_prefix: an optional prefix associated with directives, e.g. '//'
+ or ';'. If not provided, the function does not attempt to match any prefix
----------------
this seems overly complicated: why not just always accept all comment markers in any type of file?
Eyeballing `"(?:^|#|//|;)\s*([A-Z0-9-_]+)\s*:"`, I don't see false positives.
================
Comment at: llvm/utils/filecheck_lint/filecheck_lint/filecheck_lint.py:140
+ matches = []
+ # TODO(bchetioui): regexes can be further improved to capture typoed
+ # directives more holistically. In practice, they seem to perform well as is.
----------------
it's not really clear what's to be done here: improved how? how can someone tell when it's time to delete this TODO?
================
Comment at: llvm/utils/filecheck_lint/filecheck_lint/filecheck_lint.py:143
+ if directive_prefix is None:
+ directive_pattern = re.compile(r'([\d\w\_][\s\d\w\-_]*):')
+ else:
----------------
I'd suggest restricting to uppercase as this is a ~universal convention and will avoid big classes of false positives, allowing to simplify elsewhere
================
Comment at: llvm/utils/filecheck_lint/filecheck_lint/filecheck_lint.py:148
+ )
+ for lineno, line in enumerate(lines, start=1):
+ match = re.search(directive_pattern, line)
----------------
up to you, but I'm not sure getting convenient access to the line number is worth doing everything linewise and looping.
You could make FilePos's constructor compute the line/column, then that class would do something useful!
(Not that it matters, but I suspect it also performs better in practice to run the regex over a big buffer, and actually needing the line number is the very rare error case)
================
Comment at: llvm/utils/filecheck_lint/filecheck_lint/filecheck_lint.py:187
+# TODO(bchetioui): check that the argument is actually part of a CHECK command.
+def parse_additional_prefixes(lines: Sequence[str]) -> Set[str]:
+ """Parses custom prefixes defined in the list of strings provided.
----------------
This seems like it could be much simpler: just take the whole file, match against a regex, and always split the captured arg on `,` since a prefix will never have one.
```
def parse_custom_prefixes(code):
for m in re.finditer(r'--check-prefix(?:es)?[ =](\S+)', code):
for prefix in m.group(1).split(','):
yield prefix
```
================
Comment at: llvm/utils/filecheck_lint/filecheck_lint/filecheck_lint.py:188
+def parse_additional_prefixes(lines: Sequence[str]) -> Set[str]:
+ """Parses custom prefixes defined in the list of strings provided.
+
----------------
this comment doesn't really say anything beyond what the function signature already says.
Maybe an example? (in the code: .... the prefixes are ...)
================
Comment at: llvm/utils/filecheck_lint/filecheck_lint/filecheck_lint.py:259
+ lines = f.readlines()
+ return find_directive_typos_impl(lines, filepath, comment_prefix, threshold)
+
----------------
why have two functions here instead of one? can't we just have the caller pass in the file content?
================
Comment at: llvm/utils/filecheck_lint/filecheck_lint/filecheck_lint.py:268
+) -> Sequence[Diagnostic]:
+ """Underlying implementation for `find_directive_typos`."""
+ all_prefixes = _prefixes.union(parse_additional_prefixes(lines))
----------------
this just echoes the function name
================
Comment at: llvm/utils/filecheck_lint/filecheck_lint/filecheck_lint.py:276
+ + list(_ignore)
+ + list(all_prefixes)
+ )
----------------
nit: make _suffixes = {"", "-NOT", "-DAG"} etc instead?
================
Comment at: llvm/utils/filecheck_lint/filecheck_lint/filecheck_lint.py:292
+ potential_directives = find_potential_directives(lines, comment_prefix)
+ diagnostics = []
+
----------------
up to you, many of these functions could be slightly terser with `yield` rather than create/append/return an array
================
Comment at: llvm/utils/filecheck_lint/filecheck_lint/filecheck_lint.py:312
+ continue
+ elif score <= threshold and best_match not in _ignore:
+ diagnostics.append(
----------------
this is a redundant second check for _ignore, you already added it to all_directives above
================
Comment at: llvm/utils/filecheck_lint/tests/filecheck_lint_test.py:1
+# ===----------------------------------------------------------------------===##
+#
----------------
this test isn't being run by anything.
you can leave it out (most utils aren't tested, this is already a tool for improving tests, recursion needs to stop at some point!)
But thinking about it more, I think it's harmless enough and probably helps when developing it.
There are a few other tools that use unittest.TestCase, AFAIK nothing runs those either.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D141508/new/
https://reviews.llvm.org/D141508
More information about the llvm-commits
mailing list