[llvm] [lit] Fix substitutions containing backslashes (PR #103042)

Martin Storsjö via llvm-commits llvm-commits at lists.llvm.org
Tue Aug 13 04:10:48 PDT 2024


https://github.com/mstorsjo created https://github.com/llvm/llvm-project/pull/103042

Substitutions can be added in a couple different ways; they can be added via the calling python scripts by adding entries to the config.substitutions dictionary, or via DEFINE lines in the scripts themselves.

The substitution strings passed to Python's re classes are interpreted so that backslashes expand to escape sequences, and literal backslashes need to be escaped.

On Unix, the script defined substitutions don't (usually, so far) contain backslashes - but on Windows, they often do, due to paths containing backslashes. This lead to a Windows specific escaping of backslashes before doing Python re substitutions - since 7c9eab8fef0ed79a5911d21eb97b6b0fa9d39f82. There's nothing inherently Windows specific about this though - any intended literal backslashes in the substitution strings need to be escaped; this is how the Python re API works.

The DEFINE lines were added later, and in order to cope with backslashes, escaping of backslashes was added in the SubstDirective class in TestRunner, applying to DEFINE lines in the tests only.

The fact that the escaping right before passing to the Python re API was done conditionally on Windows led to two inconsistencies:

- DEFINE lines in the tests that contain backslashes got double backslashes on Windows. (This was visible as a FIXME in llvm/utils/lit/tests/Inputs/shtest-define/value-escaped.txt.)

- Script provided substitutions containing backslashes did not work on Unix, but they did work on Windows.

By removing the escaping from SubstDirective and escaping it unconditionally in the processLine function, before feeding the substitutions to Python's re classes, we should have consistent behaviour across platforms, and get rid of the FIXME in the lit test.

This fixes issues with substitutions containing backslashes on Unix platforms, as encountered in PR #86649.

>From 870eef251bdf64652071fff0287beff522e0ca84 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Martin=20Storsj=C3=B6?= <martin at martin.st>
Date: Tue, 13 Aug 2024 12:09:35 +0300
Subject: [PATCH] [lit] Fix substitutions containing backslashes

Substitutions can be added in a couple different ways; they can
be added via the calling python scripts by adding entries to the
config.substitutions dictionary, or via DEFINE lines in the
scripts themselves.

The substitution strings passed to Python's re classes are
interpreted so that backslashes expand to escape sequences, and
literal backslashes need to be escaped.

On Unix, the script defined substitutions don't (usually, so
far) contain backslashes - but on Windows, they often do, due to
paths containing backslashes. This lead to a Windows specific
escaping of backslashes before doing Python re substitutions -
since 7c9eab8fef0ed79a5911d21eb97b6b0fa9d39f82. There's nothing
inherently Windows specific about this though - any intended
literal backslashes in the substitution strings need to be escaped;
this is how the Python re API works.

The DEFINE lines were added later, and in order to cope with
backslashes, escaping of backslashes was added in the
SubstDirective class in TestRunner, applying to DEFINE lines in
the tests only.

The fact that the escaping right before passing to the Python re
API was done conditionally on Windows led to two inconsistencies:

- DEFINE lines in the tests that contain backslashes got double
  backslashes on Windows. (This was visible as a FIXME in
  llvm/utils/lit/tests/Inputs/shtest-define/value-escaped.txt.)

- Script provided substitutions containing backslashes did not
  work on Unix, but they did work on Windows.

By removing the escaping from SubstDirective and escaping it
unconditionally in the processLine function, before feeding the
substitutions to Python's re classes, we should have
consistent behaviour across platforms, and get rid of the FIXME
in the lit test.

This fixes issues with substitutions containing backslashes on
Unix platforms, as encountered in PR #86649.
---
 llvm/utils/lit/lit/TestRunner.py                          | 8 +++-----
 .../lit/tests/Inputs/shtest-define/value-escaped.txt      | 7 ++-----
 2 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/llvm/utils/lit/lit/TestRunner.py b/llvm/utils/lit/lit/TestRunner.py
index da7fa86fd39173..7f312ad0768f0e 100644
--- a/llvm/utils/lit/lit/TestRunner.py
+++ b/llvm/utils/lit/lit/TestRunner.py
@@ -1587,7 +1587,6 @@ def adjust_substitutions(self, substitutions):
         assert (
             not self.needs_continuation()
         ), "expected directive continuations to be parsed before applying"
-        value_repl = self.value.replace("\\", "\\\\")
         existing = [i for i, subst in enumerate(substitutions) if self.name in subst[0]]
         existing_res = "".join(
             "\nExisting pattern: " + substitutions[i][0] for i in existing
@@ -1600,7 +1599,7 @@ def adjust_substitutions(self, substitutions):
                     f"{self.get_location()}"
                     f"{existing_res}"
                 )
-            substitutions.insert(0, (self.name, value_repl))
+            substitutions.insert(0, (self.name, self.value))
             return
         if len(existing) > 1:
             raise ValueError(
@@ -1622,7 +1621,7 @@ def adjust_substitutions(self, substitutions):
                 f"Expected pattern: {self.name}"
                 f"{existing_res}"
             )
-        substitutions[existing[0]] = (self.name, value_repl)
+        substitutions[existing[0]] = (self.name, self.value)
 
 
 def applySubstitutions(script, substitutions, conditions={}, recursion_limit=None):
@@ -1738,8 +1737,7 @@ def processLine(ln):
         # Apply substitutions
         ln = substituteIfElse(escapePercents(ln))
         for a, b in substitutions:
-            if kIsWindows:
-                b = b.replace("\\", "\\\\")
+            b = b.replace("\\", "\\\\")
             # re.compile() has a built-in LRU cache with 512 entries. In some
             # test suites lit ends up thrashing that cache, which made e.g.
             # check-llvm run 50% slower.  Use an explicit, unbounded cache
diff --git a/llvm/utils/lit/tests/Inputs/shtest-define/value-escaped.txt b/llvm/utils/lit/tests/Inputs/shtest-define/value-escaped.txt
index 68cf35825e2a64..b3655f7fd9ab82 100644
--- a/llvm/utils/lit/tests/Inputs/shtest-define/value-escaped.txt
+++ b/llvm/utils/lit/tests/Inputs/shtest-define/value-escaped.txt
@@ -1,16 +1,13 @@
-# FIXME: The doubled backslashes occur under windows.  That's almost surely a
-# lit issue beyond DEFINE/REDEFINE.
-
 # Escape sequences that can appear in python re.sub replacement strings have no
 # special meaning in the value.
 
 # DEFINE: %{escape} = \g<0>\n
 # RUN: echo '%{escape}'
-# CHECK:# | {{\\?}}\g<0>{{\\?}}\n
+# CHECK:# | \g<0>\n
 
 # REDEFINE: %{escape} = \n                                                     \
 # REDEFINE:             \g<param>
 # RUN: echo '%{escape}'
-# CHECK:# | {{\\?}}\n {{\\?}}\g<param>
+# CHECK:# | \n \g<param>
 
 # CHECK: Passed: 1 {{\([0-9]*.[0-9]*%\)}}



More information about the llvm-commits mailing list