[PATCH] D133172: [lit] Extend DEFINE/REDEFINE with function substitutions

Joel E. Denny via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Sep 26 09:36:02 PDT 2022


jdenny added a comment.

In D133172#3814345 <https://reviews.llvm.org/D133172#3814345>, @jhenderson wrote:

> In D133172#3811800 <https://reviews.llvm.org/D133172#3811800>, @jdenny wrote:
>
>> OK, I see the idea now.  However, that adds a third character that cannot be used in arguments.
>
> This is true. However, it seems like any solution has a restriction along these lines, unless you provide some form of escaping. In the current solution, you've effectively done this escaping via the use of `%)`.

Right, and that can be escaped as `%%)`, reusing the existing `%%` substitution.

> I don't know whether this is a good idea, but in my suggestion, you could potentially use `%` to escape the special characters to actually appear as literals in the input arguments. However, I'm also inclined to say we don't need to try to solve this problem up-front necessarily.

I disagree about the last sentence.  In my downstream work, I often needed commas because they often appear in command-line arguments, and I sometimes need parens there too.

I think this is an example of what you're proposing:

  DEFINE: %{check( CFLAGS, FCFLAGS )} = %clang %{CFLAGS} -o %t %s && %t | FileCheck %{FCFLAGS} %s
  DEFINE: %{fcflags-common} = -DPTR='%(null%)'
  RUN: %{check( -Wl%,-q, -check-prefixes=CHECK%,QUIET %{fcflags-common} )}

which would expand to:

  RUN: %clang -Wl,-q -o %t %s && %t | FileCheck -check-prefixes=CHECK,QUIET -DPTR='(null)' %s

Here's why I find that proposal complicated:

- Parens, comma, and `}`, which are normally inert text, would be special only within uses of function substitution argument lists.  (I understand that, depending on how it would be parsed, `}` might not need to be special before the closing `)`, but the main point still stands.)
- We need to introduce several additional lit substitutions to allow those characters to appear literally in arguments : `%,`, `%(`, `%)`, and `%}`.
- Where do we define those new substitutions?  I argue they should be special substitutions that, similar to `%%`, expand after all other substitutions have been expanded `recursiveExpansionLimit` times.  That way, substitutions like `%{fcflags-common}` can be reused outside argument lists in other `RUN` lines.  Moreover, the test author doesn't have to worry the new substitutions will expand too early (before emerging in an argument list that's ready to be parsed) because he didn't think through substitution order, function uses hidden behind other substitutions, function uses within other functions' args, and how all that interacts with `recursiveExpansionLimit`.
- Test authors have to adjust all existing substitution values that contain commas or parens if they start to also use those substitutions in the args of functions.  For example, maybe `%{fcflags-common}` was actually defined already in a lit configuration file.

It seems to me that it's simpler for the test author and the lit developer if we just say `%` is meaningful to lit, and other text is inert, as in the following example:

  DEFINE: %{check}%( CFLAGS %, FCFLAGS %) = %clang %{CFLAGS} -o %t %s && %t | FileCheck %{FCFLAGS} %s
  DEFINE: %{fcflags-common} = -DPTR='(null)'
  RUN: %{check}%( -Wl,-q %, -check-prefixes=CHECK,QUIET %{fcflags-common} %)



> I'm not sure that would help the situation. The full syntax would look like this: `%{name}%(arg1, arg2%)`. By squinting, I can get the `%(` and `%)` to marry up, but I don't see the name half pairing up with the arg part intuitively. I don't know if this is necessarily a good idea overall, but I'd almost find it easier if the syntax were `%{name}(arg1, arg2)%` with the two `%` characters providing a "paired grouping" much like parentheses already do in typical computer programming. I think my problem is that the `%` characters in the current syntax aren't symmetrical about the midpoint of the syntax.

For the record, the `%if` syntax already uses `%{` and `%}`.  That is, `%` consistently starts a special sequence.

> I'm just one opinion, but I believe that one of the problems of regexes is that they require a decent knowledge of that particular language's regex language, since every variant is somewhat different, especially when getting into the more complicated functionality. Contrast that to a (hopefully simple) parser, if someone is reading this code, it's likely they already know python, so it should be fairly simple to follow.

OK, if no one else objects, let's agree to go with a recursive-descent parser instead of a single regex.  That should address several issues with the current patch:

- Some lit developers might find it easier to understand.  If that's true for the majority of lit developers, then my original motivation for the single regex is defeated.

- For invalid uses of function substitutions (e.g., wrong number of arguments, unclosed argument list), it can provide helpful diagnostics instead of just quietly failing to substitute.

- It can handle function calls within function arguments without requiring `recursiveExpansionLimit`.

- Function substitution entries in the substitution list (that is, `config.substitutions`) will not have the form `(pattern, replacement)` but instead will list parameter names so a recursive-descent parser can parse function substitution uses.  For example, we might decide that any entry with more than two items would be interpreted as `(func_name, param1, param2, ..., replacement)`.  On the one hand, the substitution list will be more complicated.  On the other hand, it means lit configuration files would immediately have a way to define function substitutions.  The current patch doesn't provide one.

But let's converge on the previous syntax questions before I pursue the rewrite.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D133172/new/

https://reviews.llvm.org/D133172



More information about the llvm-commits mailing list