[PATCH] D76178: [lit] Recursively apply substitutions

Mon Mar 30 12:30:13 PDT 2020

jdenny added a comment.

In D76178#1950193 <https://reviews.llvm.org/D76178#1950193>, @ldionne wrote:

> You're right that if one sets up substitutions in the correct order (and assuming no circular dependencies), a single pass is sufficient to expand substitutions. However, the problem is precisely that one usually doesn't have control over the order in which substitutions are set up,

When does this problem occur?  So far, I've found I had sufficient control, but I probably haven't tried to do what you have.

> and frankly I don't think it should matter.

I agree it's messy.

> Given that, what we could do instead is switch the implementation from doing multiple passes to doing a topological sort of the substitutions based on whether a substitution is required by another one. The `recursiveExpansionLimit` would then become a boolean like `orderSubstitutionsSomething`. It would be an error if no linearization can be found, i.e. if there's a dependency cycle in substitutions.

Are we sure there's no case where the result of a substitution combines with neighboring text (perhaps from another substitution) to form a new substitution?

If not, then it should be straight-forward to recursively expand a substitution upon its first occurrence and cache results for it and for any nested substitutions.  Two new arrays would be needed: one for the full expansions (`None` until first occurrence) and one for marks to detect infinite recursion.  This seems efficient.

To avoid accidentally permitting substitutions to combine with neighboring text to form a new substitution, it is probably worthwhile to split the string up to avoid searching previously substituted text while searching for other substitutions.

> Note that it would still have to be opt-in, because some test suites use patterns without delimiters and they (almost certainly unknowingly) rely on the fact that their substitutions happen to be processed in a specific order.

One issue is that of prefixing, such as `%foo` vs. `%foo-bar`.  I believe lit would still just process those in the order defined, and there wouldn't be backward-compatibility issues.  Ideally, lit would have a longest-match mode that test suites would eventually adopt, but a backward-compatible first-match mode might have to be the default at first.

The issue I mentioned above, where substitutions might combine with neighboring text, might require this feature to be opt-in.

> I thought about this and I both couldn't find an implementation of topological sort in the Python Standard Library,

If Python offers something helpful, then great, but the algorithm I mentioned above seems straight-forward.  Is it harder than I think?

> and also thought that solution was overkill. Please let me know if you think that direction should be explored, and I'll research it some

I need a better understanding of the use case to respond to this.

> However, let's not revert this because the feature is useful in its current state and the libc++ test suite needs it -- if we switch to a topo sort the usage will be mostly the same so it'll be easy to change libc++ at the same time.

In the short term, I think the documentation could be improved, in the ways I pointed out in my last comment.

In the long term, I'm hoping for a more complete solution.

Thanks for working on this.  I agree lit substitutions could be cleaner.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76178/new/

https://reviews.llvm.org/D76178