[PATCH] D141508: [utils][filecheck-lint] Add filecheck-lint, a linter for FileCheck directives.

Benjamin Chetioui via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Jan 12 05:17:04 PST 2023


bchetioui marked an inline comment as done.
bchetioui added inline comments.


================
Comment at: llvm/utils/filecheck_lint/filecheck_lint/filecheck_lint.py:131
+    lines: a sequence of strings
+    directive_prefix: an optional prefix associated with directives, e.g. '//'
+      or ';'. If not provided, the function does not attempt to match any prefix
----------------
sammccall wrote:
> bchetioui wrote:
> > sammccall wrote:
> > > bchetioui wrote:
> > > > sammccall wrote:
> > > > > this seems overly complicated: why not just always accept all comment markers in any type of file?
> > > > > 
> > > > > Eyeballing `"(?:^|#|//|;)\s*([A-Z0-9-_]+)\s*:"`, I don't see false positives.
> > > > A comment marker is not strictly necessary for something to be parsed as a directive by FileCheck---meaning I'd like to keep the option for directive prefix to be optional. But if it is always optional, then I am afraid we may see a host of false positives, since anything preceding a colon would always be matched for every type of file.
> > > > 
> > > > I personally prefer it like this, but if you'd rather, I could change "directive_prefix" to a boolean "match_comment_marker". When enabled, I would match any comment marker. WDYT?
> > > > 
> > > > Apart from the comment markers, I think the regex you suggest will not yield false positives, but that it may allow more "false negatives" to pass through. E.g., in https://github.com/tensorflow/tensorflow/commit/48cacf049f3d6ed3f289ccc48ec50491b6d8d9a8, one of typos contained lowercase letters, and another contained a space instead of a dash. These would not be caught by the regex you suggest, so I would like to keep the regex as is. WDYT?
> > > > A comment marker is not strictly necessary for something to be parsed as a directive by FileCheck
> > > 
> > > Sure, but you're using heuristics, and have to make some decisions. If **all** filters are loose (don't require caps, allow spaces, don't require it to be at the start of a line/comment) then you're going to have lots of false positives.
> > > 
> > > In the code the comment marker is not optional: you always pass one at the callsite. So the code handling `directive_prefix is None` is dead & I think we should delete it. Alternatively, if you want to allow directives to appear anywhere,  delete the case that handles `directive_prefix`. Unless it's really important, we don't need to define an API that lets you turn this on or off.
> > > 
> > > Sure, that's why we would have `^` in the alternation: it will match at the beginning of the line (I guess you have to set re.MULTILINE in python for this)
> > > 
> > > > Apart from the comment markers, I think the regex you suggest will not yield false positives, but that it may allow more "false negatives" to pass through
> > > 
> > > Fair enough. Allowing spaces *and* lowercase letters seems very permissive. Matching too many things may not be a problem per se, though it could lead to missing the right match (e.g. `blah CHEC: something` won't match "CHEC" because it matches "blah CHEC"). But the details are up to you.
> > > 
> > > > I would like to keep the regex as is.
> > > 
> > > Sure, the details of the regex don't bother me much as simplifying the code a bit - regardless of which characters you choose to match, 
> > Small note: the code path is not actually dead, as files with extensions that are not mapped to an inline comment prefix will have their `directive_prefix` set to `None`.
> > 
> > However, I take your point, and I am no longer sure if this is a net positive to add. In the worst case, if people start using the linter with new file formats that use different comment prefixes than the ones we match, they can always update the regex.
> > 
> > I deleted the code and hardcoded the inline comment prefixes in the regex.
> > 
> > > (e.g. blah CHEC: something won't match "CHEC" because it matches "blah CHEC")
> > 
> > This can surely be improved, and I plan to do so in a later change.
> > the code path is not actually dead, as files with extensions that are not mapped to an inline comment prefix will have their directive_prefix set to None
> 
> The previous snapshot still had `comment_prefix = comment_prefix or '//'` - maybe an oversight and I was too eager to assume it was intentional!
Oops, you're right, that was an oversight (I removed it now, and didn't even register it). Thanks for catching it, my bad!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141508/new/

https://reviews.llvm.org/D141508



More information about the llvm-commits mailing list