[llvm] update_test_checks: keep meta variables stable by default (PR #76748)
Jannik Silvanus via llvm-commits
llvm-commits at lists.llvm.org
Wed Mar 6 03:25:54 PST 2024
Nicolai =?utf-8?q?Hähnle?= <nicolai.haehnle at amd.com>,
Nicolai =?utf-8?q?Hähnle?= <nicolai.haehnle at amd.com>,
Nicolai =?utf-8?q?Hähnle?= <nicolai.haehnle at amd.com>,
Nicolai =?utf-8?q?Hähnle?= <nicolai.haehnle at amd.com>,
Nicolai =?utf-8?q?Hähnle?= <nicolai.haehnle at amd.com>
Message-ID:
In-Reply-To: <llvm.org/llvm/llvm-project/pull/76748 at github.com>
================
@@ -1187,20 +1233,317 @@ def may_clash_with_default_check_prefix_name(check_prefix, var):
)
+def find_diff_matching(lhs: List[str], rhs: List[str]) -> List[int]:
+ """
+ Find a large ordered matching between strings in lhs and rhs.
+
+ Think of this as finding the *unchanged* lines in a diff, where the entries
+ of lhs and rhs are lines of the files being diffed.
+
+ Returns a list of matched (lhs_idx, rhs_idx) pairs.
+ """
+
+ # Collect matches in reverse order.
+ matches = []
+
+ def recurse(lhs_start, lhs_end, rhs_start, rhs_end):
+ if lhs_start == lhs_end or rhs_start == rhs_end:
+ return
+
+ # First, collect a set of candidate matching edges. We limit this to a
+ # constant multiple of the input size to avoid quadratic runtime.
+ patterns = collections.defaultdict(lambda: ([], []))
+
+ for idx in range(lhs_start, lhs_end):
+ patterns[lhs[idx]][0].append(idx)
+ for idx in range(rhs_start, rhs_end):
+ patterns[rhs[idx]][1].append(idx)
+
+ multiple_patterns = []
+
+ candidates = []
+ for pattern in patterns.values():
+ if not pattern[0] or not pattern[1]:
+ continue
+
+ if len(pattern[0]) == len(pattern[1]) == 1:
+ candidates.append((pattern[0][0], pattern[1][0]))
+ else:
+ multiple_patterns.append(pattern)
+
+ multiple_patterns.sort(key=lambda pattern: len(pattern[0]) * len(pattern[1]))
+
+ for pattern in multiple_patterns:
+ if len(candidates) + len(pattern[0]) * len(pattern[1]) > 2 * (len(lhs) + len(rhs)):
----------------
jasilvanus wrote:
I find this a bit overly restrictive -- this means we'll give up on
```
find_diff_matching(["foo", "foo", "foo"], ["foo", "foo", "foo"])
```
As simple improvement, we could just allow an additive constant (say 100) to not have to worry about small cases.
I think we can easily have large blocks of identical lines (ignoring variable names as in the first matching round), e.g. if there are many similar loads or stores resulting from a bulk copy, and it's a bit unsatisfying to completely give up on those.
Maybe instead we could limit the number of edges incident to any given line (say on the LHS) by a constant, which is enough to guarantee there are not too many edges, and if the number of matching lines in the RHS is larger, then take a subset? For example, we could sample randomly, or select lines in a similar relative position.
https://github.com/llvm/llvm-project/pull/76748
More information about the llvm-commits
mailing list