[PATCH] D133172: [lit] Extend DEFINE/REDEFINE with function substitutions

Joel E. Denny via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Sep 23 08:30:49 PDT 2022


jdenny added a comment.

In D133172#3810988 <https://reviews.llvm.org/D133172#3810988>, @jhenderson wrote:

> The %if syntax is slightly different, because there's no way of "grouping" the whole thing into one "name", I think? (I haven't looked back at the syntax for it, so might be misremembering it).

At one point, `%if` used a single regex.  There were two issues, as I recall:

1. The match had to be processed to evaluate the condition and select the //then// or //else//.  As a result, `re.sub` wasn't used.  Instead, `re.search` was used followed by post-processing.  (Actually, I believe the post-processing could be have been handled by providing a python function for the replacement in `re.sub`.)

2. People wanted to nest `%if` in the //then// and //else//.  That requires a parser stack, so a simple regex wasn't sufficient, and a recursive-descent parser was implemented.  My hunch is that this part of the implementation led to the negative comments about complexity.

Function substitutions as proposed in this patch also have problem 2 (requires parser stack): you might want a function call to appear in an argument of another function call.  Because I never found a reason to do that in my test suites, I suspected it wouldn't be common.  I also complicated the regex a bit to be sure `recursiveExpansionLimit` could make such cases work.  So, for now, I went with a regex applied via `re.sub` instead of a recursive-descent parser.

I've read that recursive regexes will be possible in python one day.  I don't know the current status for sure, but I got the impression it's a bleeding-edge feature requiring a different regex library than `re`.  That feature will probably enable addressing 2 elegantly for both `%if` and function substitutions.  One day.

> For the DEFINE/REDEFINE case, a format with `%{foo(arg1, arg2)}` has the closing `}` as a convenient terminator for the "pattern", so anything (e.g. `,` and `)`) in between could be considered special (just don't try to use a `}` in the arg list!).

OK, I see the idea now.  However, that adds a third character that cannot be used in arguments.

> One of my main concerns is that the argument portion of the proposed syntax doesn't necessarily look like it really is a set of arguments to the untrained eye. The `%)` maybe just about binds itself to the `(`, but none of it really binds itself to the `%{name}` part, so it just doesn't feel like the two halves of the whole expression have anything to do with each other.

I understand.  I suffer from the lack of objectivity that comes from a "trained eye".  That is, I stared at my syntax too long as it evolved into its current form.

To help `(` and `%)` seem to bind better, we could change them to `%(` and `%)`.  To rationalize how the argument list then binds to `%{name}`, compare `name(args)` in a C-like programming language to `%{name}%(args%)`.  That is, `name` -> `%{name}`, `(` -> `%(`, and `)` -> `%)`.  Notice that `%` is used to introduce every special sequence.

I considered `%(` and `%)` early on, but it seemed more cluttered than technically necessary, so I decided to think of `%{name}(` as a single token instead.  I don't care that much though, so I'm fine to switch to `%(` if people like it better.

> I think I might have misremembered how these substitutions are handled in practice, but in essence, I imagined doing (effectively) `re.search` to find a substitution in the input string, and swapping it in, repeating the process for each substitution. As the `%{name(arg1, arg2)}` syntax wouldn't work for that, you'd have to do something different or extra for these cases. Perhaps it would be `re.search` for the `%{name` part and then find the next `}` manually to form the "matching" block.

It sound like you're trying to handle the call-in-args part of this, which requires a parser stack or `recursiveExpansionLimit`, as discussed above.  In that regard, I see no difference between the syntax proposed in the current patch and the `%{name(arg1, arg2)}` syntax.

> It may not be less complex in terms of overall code, but it seems like it would require a much simpler regex, therefore making the code perhaps easier to understand?

I guess that's in the eye of the beholder.  Where possible, I tend to prefer formal definitions of syntax (like regexes) over ad-hoc implementations (like recursive-descent parsing), and I'm looking forward to a more powerful regex library in python.  However, I'm open to reimplementing this as a recursive-descent parser now if that's more acceptable to everyone.  Or maybe I should spend more time researching that new regex library to see if it's usable now somehow.

>>> but the format including `(` and `)` in the name means this would at least be fairly easy to detect.
>>
>> I don't know what you mean.  Can you explain more?
>
> I was thinking that you'd be able to distinguish the arg-based substitutions from regular ones, by identifying the presence of `(` in the pattern, and then could decide to do a different thing for that substitution.

I might be misunderstanding your point, but I don't think that's an actual problem inherent in the syntax proposed in this patch.

The general substitution algorithm now is to iterate the list of substitutions and apply one at a time to the current RUN line.  When arriving at a function substitution `%{name}` in the list, if we were to use recursive-descent parsing, I imagine it would then search for `%{name}(` in the RUN line.  Upon finding it, it would then look for the closing `%)` while recursively balancing anything like `%{looks-like-another-func}(args%)` in the argument list.  Regardless of whether `%{looks-like-another-func}(` is actually a function substitution, the matching `%)` binds to it for the sake of processing the arguments of `%{name}`.  With your proposed syntax, I think the behavior would be analogous: regardless of whether `%{looks-like-another-func(` is actually a function substitution, the matching `)}` binds to it.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D133172/new/

https://reviews.llvm.org/D133172



More information about the llvm-commits mailing list