[PATCH] D76178: [lit] Recursively apply substitutions
Louis Dionne via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Jul 15 10:25:20 PDT 2020
ldionne added a subscriber: rnk.
ldionne added a comment.
I'm revisiting this now. Comments below.
In D76178#1970410 <https://reviews.llvm.org/D76178#1970410>, @yln wrote:
> My feedback (also sent via email):
>
> I talked to Louis. He proposed the following plan and is willing to the work:
>
> 1. Implement longest-match matching of substitutions (this is a useful improvement in itself)
> 2. Address non-termination in a more principled way (by sorting substitutions or another similar algorithm)
> 3. Measure performance
> 4. Provide ergonomic helper to print expanded substitutions to aid debugging
> 5. Remove option in the config, turn on by default
Now that I'm looking at the changes required, and after letting this sleep for some time, I think it might be reasonable to short circuit some of this. What I'd propose is:
1. Always sort the substitutions in reverse order of length, i.e.%clang_exe will always be processed before %clang. This is a simple and efficient way to implement longest-match.
2. Always process substitutions recursively, up to a fixed point or some arbitrary depth. I suggest something like 50.
3. Remove the option in the config.
All these steps would have to be performed at once, though, since reordering the substitutions in (1) requires processing them recursively. I tried otherwise and some test suites do use recursive substitutions by ordering the substitutions in a way that it works, which would be broken if we do (1) in isolation. I think it might be possible to do these step-by-step, but at a significant cost in complexity especially for implementing longest-match.
I looked into addressing non-termination in a more principled way, and the issue is that doing it properly requires figuring out the dependencies between substitutions, which is equivalent to building a graph where nodes are substitution-keys and edges represent the `is contained in the result of substituting` relationship. The topological sort of this graph, if it exists, is the order in which we should apply the substitutions. If no such ordering exists, there's a cycle and we should error out. There are two problems with that:
1. It's fairly involved to build that graph: I'd ballpark O(n^2) since for each substitution, we need to examine all other substitutions to see whether they depend on it. And then we need to sort the substitutions based on that graph representation, which is also not free. I'm concerned that it would be less efficient than just processing substitutions until a fixed point. One should keep in mind that cycles should never happen (they are basically a programmer error when setting up the test suite), and the fixed point should be reached after only a few passes in almost all cases.
2. The sort required for longest-match and the one for recursive substitutions might not agree. For example, consider the following set of substitutions: `[('%clang_exe', '/usr/bin/c++'), ('%clang', '%clang_exe')]`. The recursive substitution sort tells us to apply `%clang` before `%clang_exe`, but the longest-match consideration would require that we do the opposite. I can't think of a way to satisfy both these criteria.
So, if you think my reasoning makes sense, the algorithm for substitutions would be:
1. Sort substitutions in reverse order of length (to avoid unexpected substitution in the prefix of other substitutions)
2. Apply substitutions in order until a fixed point is encountered, or `N` steps have been performed. I suggest `N=50` or something along those lines -- we should strive to use a value of `N` that will make all legitimate uses we can think of work, while still catching a cyclic substitution eventually.
It's a simple and fairly efficient algorithm, given that in most cases just one or two passes would be required (i.e. in the best case, it's equivalent to not having recursive substitutions at all). WDYT?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D76178/new/
https://reviews.llvm.org/D76178
More information about the llvm-commits
mailing list