[PATCH] D104572: [lit] Add the ability to parse regexes in Lit boolean expressions

Mon Jun 28 11:48:00 PDT 2021

jdenny added a comment.

There should be some user-level documentation about this new feature.  It looks like the appropriate place is:

https://llvm.org/docs/TestingGuide.html#constraining-test-execution

================
Comment at: llvm/utils/lit/lit/BooleanExpression.py:184
         self.assertTrue(BooleanExpression.evaluate('-os', {}, triple))
         self.assertFalse(BooleanExpression.evaluate('arch-os', {}, triple))

----------------
Should there be some minimal test somewhere (maybe here) ensuring that a regex is handled as a literal string when matching triples?  I think that means it cannot ever match (due to `{{` and `}}`), so maybe it's not a very interesting behavior, but we still might want to be aware if we accidentally change it.

================
Comment at: llvm/utils/lit/lit/BooleanExpression.py:114
             self.expect(')')
         elif not BooleanExpression.isIdentifier(self.token):
             raise ValueError("expected: '!' or '(' or identifier\nhave: %s" %
----------------
ldionne wrote:
> jdenny wrote:
> > `isIdentifier` seems misnamed if it accepts `{{`.
> > 
> > The logic should probably be something like either `not isIdentifier and not isRegexOpen` or `not isIdentiferOrRegexOpen`.
> I agree that `isIdentifier` is misnamed, I'll fix that.
> 
> I also agree that it should be something like `not isIdentifier and not isRegexOpen`, however this implementation does not track the fact that we're parsing the content of a regular expression -- it doesn't know that it's inside a `{{`. Instead, the tokenization pattern was augmented to treat anything with `{{<whatever>}}` as a token of its own - that was by far the simplest way I could find to implement it. So instead, I believe the fix is to simply rename `isIdentifier` to `isMatchExpression`, and to acknowledge that we now have a new leaf in the grammar, and that `isMatchExpression` basically allows us to detect that.
That seems reasonable.  Thanks for addressing it.

================
Comment at: llvm/utils/lit/lit/BooleanExpression.py:115
         elif not BooleanExpression.isIdentifier(self.token):
             raise ValueError("expected: '!' or '(' or identifier\nhave: %s" %
                              self.quote(self.token))
----------------
ldionne wrote:
> jdenny wrote:
> > Shouldn't this mention `{{` as a possibility?
> I'll say `expected '!' or '(' or match-expression` instead, LMK if that's not satisfying.
While that does reflect the implementation, I think `expected '!', '(', '{{', or identifier` would be more meaningful to a user who isn't familiar with the internal grammar symbol names.  As far as they know, `match-expression` could be, for example, the start symbol for the whole grammar.  In contrast, `{{` and `identifier` are probably clear enough to any user.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104572/new/

https://reviews.llvm.org/D104572