[llvm] update_test_checks: keep meta variables stable by default (PR #76748)

Wed Jan 10 07:51:31 PST 2024

Nicolai =?utf-8?q?Hähnle?= <nicolai.haehnle at amd.com>,
Nicolai =?utf-8?q?Hähnle?= <nicolai.haehnle at amd.com>,
Nicolai =?utf-8?q?Hähnle?= <nicolai.haehnle at amd.com>,
Nicolai =?utf-8?q?Hähnle?= <nicolai.haehnle at amd.com>
Message-ID:
In-Reply-To: <llvm.org/llvm/llvm-project/pull/76748 at github.com>


================
@@ -1176,20 +1214,236 @@ def may_clash_with_default_check_prefix_name(check_prefix, var):
     )
 
 
+VARIABLE_TAG = "[[@@]]"
+METAVAR_RE = re.compile(r"\[\[([A-Z0-9_]+)(?::[^]]+)?\]\]")
+NUMERIC_SUFFIX_RE = re.compile(r"[0-9]*$")
+
+
+class CheckValueInfo:
+    def __init__(
+        self,
+        nameless_value: NamelessValue,
+        var: str,
+        prefix: str,
+    ):
+        self.nameless_value = nameless_value
+        self.var = var
+        self.prefix = prefix
+
+
+class CheckLineInfo:
+    def __init__(self, line, values):
+        self.line: str = line
+        self.values: List[CheckValueInfo] = values
+
+    def __repr__(self):
+        return f"CheckLineInfo(line={self.line}, self.values={self.values})"
+
+
+def remap_metavar_names(
+    orig_line_infos: List[CheckLineInfo],
+    new_line_infos: List[CheckLineInfo],
+    committed_names: Set[str],
+) -> Mapping[str, str]:
+    """
+    Map all FileCheck variable names that appear in new_line_infos to new
+    FileCheck variable names in an attempt to reduce the diff from orig_line_infos
+    to new_line_infos.
+    """
+    # Initialize uncommitted identity mappings
+    new_mapping = {}
+    for line in new_line_infos:
+        for value in line.values:
+            new_mapping[value.var] = value.var
+
+    # Recursively commit to the identity mapping or find a better one
+    def recurse(
+        orig_line_infos: List[CheckLineInfo], new_line_infos: List[CheckLineInfo]
+    ):
+        if not new_line_infos or not orig_line_infos:
+            return
+
+        lines = set()
+
+        # Search for lines that are identical on both sides, including meta
+        # variable names, and commit to those names immediately
+        for line in orig_line_infos:
+            key = (line.line.strip(), tuple(value.var for value in line.values))
+            lines.add(key)
+
+        for line in new_line_infos:
+            key = (
+                line.line.strip(),
+                tuple(new_mapping[value.var] for value in line.values),
+            )
+            if key in lines:
+                for value in line.values:
+                    committed_names.add(new_mapping[value.var])
+
+        # Search for lines that are unique on both sides if we only consider
+        # variable names that have been committed.
+        lines = collections.defaultdict(lambda: [None, None])
+        for i, line in enumerate(orig_line_infos):
+            key = (
+                line.line.strip(),
+                tuple(
+                    value.var for value in line.values if value.var in committed_names
+                ),
+            )
+            entry = lines[key]
+            if entry[0] is None:
+                entry[0] = i
+            else:
+                entry[0] = False
+
+        for i, line in enumerate(new_line_infos):
+            key = (
+                line.line.strip(),
+                tuple(
+                    new_mapping[value.var]
+                    for value in line.values
+                    if new_mapping[value.var] in committed_names
+                ),
+            )
+            entry = lines[key]
+            if entry[1] is None:
+                entry[1] = i
+            else:
+                entry[1] = False
+
+        unique_matches = []
+        for entry in lines.values():
+            if (
+                entry[0] is not None
+                and entry[0] is not False
+                and entry[1] is not None
+                and entry[1] is not False
+            ):
+                unique_matches.append((entry[0], entry[1]))
+
+        if not unique_matches:
+            # There are no unique matches. This is the recursion base case.
+            return
+
+        # Compute a maximal crossing-free matching via dynamic programming
+        unique_matches.sort(key=lambda entry: entry[0])
+
+        backlinks = []
+        table = []
+        for _, new_idx in unique_matches:
+            ti = bisect.bisect_left(table, new_idx, key=lambda entry: entry[0])
+            if ti < len(table):
+                table[ti] = (new_idx, len(backlinks))
+            else:
+                table.append((new_idx, len(backlinks)))
+            if ti > 0:
+                backlinks.append(table[ti - 1][1])
+            else:
+                backlinks.append(None)
----------------
jasilvanus wrote:

I think I'm missing some parts of the matching algorithm. Where are you ensuring that the matching is a maximal(um?) crossing-free matching?

If I understand correctly, you are iterating over the edges sorted by the left (old) endpoint, and build a crossing-free matching by collecting the participating edges in `table`, sorted by right (new) endpoint. You search the place where a new edge should be inserted with `bisect`. If the `bisect` result points at an existing table entry, the two cross (by iteration order) and you remove the old one by overrriding it with the new one.

But what if the new edge crosses multiple old edges?
For example, consider edges `(1,1), (2,3), (3,4), (4, 2)`. After inserting `(4,2)`, you remove `(2,3)`, and end up with `(1,1), (4,2), (3,4)` which is crossing.

I think the code below will remove conflicts, but it seems more like a greedy algorithm than a DP in this form, as later edges always win over previous ones?

I can see how one could compute a maximum crossing-free matching in n**2 time, but that would be a bit more complicated.

That said, guaranteeing optimality might be overkill, maybe a heuristic is good enough for this purpose.
Or I am missing something and this is already computing optimum assignments :)

https://github.com/llvm/llvm-project/pull/76748