[llvm] [lit] Fix substitutions containing backslashes (PR #103042)

Martin Storsjö via llvm-commits llvm-commits at lists.llvm.org
Wed Aug 21 13:18:32 PDT 2024


https://github.com/mstorsjo updated https://github.com/llvm/llvm-project/pull/103042

>From 870eef251bdf64652071fff0287beff522e0ca84 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Martin=20Storsj=C3=B6?= <martin at martin.st>
Date: Tue, 13 Aug 2024 12:09:35 +0300
Subject: [PATCH 1/3] [lit] Fix substitutions containing backslashes

Substitutions can be added in a couple different ways; they can
be added via the calling python scripts by adding entries to the
config.substitutions dictionary, or via DEFINE lines in the
scripts themselves.

The substitution strings passed to Python's re classes are
interpreted so that backslashes expand to escape sequences, and
literal backslashes need to be escaped.

On Unix, the script defined substitutions don't (usually, so
far) contain backslashes - but on Windows, they often do, due to
paths containing backslashes. This lead to a Windows specific
escaping of backslashes before doing Python re substitutions -
since 7c9eab8fef0ed79a5911d21eb97b6b0fa9d39f82. There's nothing
inherently Windows specific about this though - any intended
literal backslashes in the substitution strings need to be escaped;
this is how the Python re API works.

The DEFINE lines were added later, and in order to cope with
backslashes, escaping of backslashes was added in the
SubstDirective class in TestRunner, applying to DEFINE lines in
the tests only.

The fact that the escaping right before passing to the Python re
API was done conditionally on Windows led to two inconsistencies:

- DEFINE lines in the tests that contain backslashes got double
  backslashes on Windows. (This was visible as a FIXME in
  llvm/utils/lit/tests/Inputs/shtest-define/value-escaped.txt.)

- Script provided substitutions containing backslashes did not
  work on Unix, but they did work on Windows.

By removing the escaping from SubstDirective and escaping it
unconditionally in the processLine function, before feeding the
substitutions to Python's re classes, we should have
consistent behaviour across platforms, and get rid of the FIXME
in the lit test.

This fixes issues with substitutions containing backslashes on
Unix platforms, as encountered in PR #86649.
---
 llvm/utils/lit/lit/TestRunner.py                          | 8 +++-----
 .../lit/tests/Inputs/shtest-define/value-escaped.txt      | 7 ++-----
 2 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/llvm/utils/lit/lit/TestRunner.py b/llvm/utils/lit/lit/TestRunner.py
index da7fa86fd39173..7f312ad0768f0e 100644
--- a/llvm/utils/lit/lit/TestRunner.py
+++ b/llvm/utils/lit/lit/TestRunner.py
@@ -1587,7 +1587,6 @@ def adjust_substitutions(self, substitutions):
         assert (
             not self.needs_continuation()
         ), "expected directive continuations to be parsed before applying"
-        value_repl = self.value.replace("\\", "\\\\")
         existing = [i for i, subst in enumerate(substitutions) if self.name in subst[0]]
         existing_res = "".join(
             "\nExisting pattern: " + substitutions[i][0] for i in existing
@@ -1600,7 +1599,7 @@ def adjust_substitutions(self, substitutions):
                     f"{self.get_location()}"
                     f"{existing_res}"
                 )
-            substitutions.insert(0, (self.name, value_repl))
+            substitutions.insert(0, (self.name, self.value))
             return
         if len(existing) > 1:
             raise ValueError(
@@ -1622,7 +1621,7 @@ def adjust_substitutions(self, substitutions):
                 f"Expected pattern: {self.name}"
                 f"{existing_res}"
             )
-        substitutions[existing[0]] = (self.name, value_repl)
+        substitutions[existing[0]] = (self.name, self.value)
 
 
 def applySubstitutions(script, substitutions, conditions={}, recursion_limit=None):
@@ -1738,8 +1737,7 @@ def processLine(ln):
         # Apply substitutions
         ln = substituteIfElse(escapePercents(ln))
         for a, b in substitutions:
-            if kIsWindows:
-                b = b.replace("\\", "\\\\")
+            b = b.replace("\\", "\\\\")
             # re.compile() has a built-in LRU cache with 512 entries. In some
             # test suites lit ends up thrashing that cache, which made e.g.
             # check-llvm run 50% slower.  Use an explicit, unbounded cache
diff --git a/llvm/utils/lit/tests/Inputs/shtest-define/value-escaped.txt b/llvm/utils/lit/tests/Inputs/shtest-define/value-escaped.txt
index 68cf35825e2a64..b3655f7fd9ab82 100644
--- a/llvm/utils/lit/tests/Inputs/shtest-define/value-escaped.txt
+++ b/llvm/utils/lit/tests/Inputs/shtest-define/value-escaped.txt
@@ -1,16 +1,13 @@
-# FIXME: The doubled backslashes occur under windows.  That's almost surely a
-# lit issue beyond DEFINE/REDEFINE.
-
 # Escape sequences that can appear in python re.sub replacement strings have no
 # special meaning in the value.
 
 # DEFINE: %{escape} = \g<0>\n
 # RUN: echo '%{escape}'
-# CHECK:# | {{\\?}}\g<0>{{\\?}}\n
+# CHECK:# | \g<0>\n
 
 # REDEFINE: %{escape} = \n                                                     \
 # REDEFINE:             \g<param>
 # RUN: echo '%{escape}'
-# CHECK:# | {{\\?}}\n {{\\?}}\g<param>
+# CHECK:# | \n \g<param>
 
 # CHECK: Passed: 1 {{\([0-9]*.[0-9]*%\)}}

>From 07f4b7be9985473b8f0510f394ca343fe792f722 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Martin=20Storsj=C3=B6?= <martin at martin.st>
Date: Wed, 21 Aug 2024 00:33:26 +0300
Subject: [PATCH 2/3] Update docs, test a substitution with backslashes

---
 llvm/docs/TestingGuide.rst                                  | 5 +++--
 llvm/utils/lit/tests/Inputs/shtest-define/lit.cfg           | 1 +
 llvm/utils/lit/tests/Inputs/shtest-define/value-escaped.txt | 3 +++
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/llvm/docs/TestingGuide.rst b/llvm/docs/TestingGuide.rst
index c35e58bc53b671..08617933519fdb 100644
--- a/llvm/docs/TestingGuide.rst
+++ b/llvm/docs/TestingGuide.rst
@@ -864,8 +864,9 @@ Additional substitutions can be defined as follows:
 - Lit configuration files (e.g., ``lit.cfg`` or ``lit.local.cfg``) can define
   substitutions for all tests in a test directory.  They do so by extending the
   substitution list, ``config.substitutions``.  Each item in the list is a tuple
-  consisting of a pattern and its replacement, which lit applies using python's
-  ``re.sub`` function.
+  consisting of a pattern and its replacement, which lit applies as plain text
+  (even if it contains sequences that python's ``re.sub`` considers to be
+  escape sequences).
 - To define substitutions within a single test file, lit supports the
   ``DEFINE:`` and ``REDEFINE:`` directives, described in detail below.  So that
   they have no effect on other test files, these directives modify a copy of the
diff --git a/llvm/utils/lit/tests/Inputs/shtest-define/lit.cfg b/llvm/utils/lit/tests/Inputs/shtest-define/lit.cfg
index a29755eb2b6007..88c956b52d7892 100644
--- a/llvm/utils/lit/tests/Inputs/shtest-define/lit.cfg
+++ b/llvm/utils/lit/tests/Inputs/shtest-define/lit.cfg
@@ -22,6 +22,7 @@ config.substitutions.insert(0, ("%{global:greeting}", ""))
 config.substitutions.insert(
     0, ("%{global:echo}", "echo GLOBAL: %{global:greeting} %{global:what}")
 )
+config.substitutions.insert(0, ("%{global:subst-with-escapes}", r"value-with-\g"))
 
 # The following substitution definitions are confusing and should be avoided.
 # We define them here so we can test that 'DEFINE:' and 'REDEFINE:' directives
diff --git a/llvm/utils/lit/tests/Inputs/shtest-define/value-escaped.txt b/llvm/utils/lit/tests/Inputs/shtest-define/value-escaped.txt
index b3655f7fd9ab82..92fe4c27664fac 100644
--- a/llvm/utils/lit/tests/Inputs/shtest-define/value-escaped.txt
+++ b/llvm/utils/lit/tests/Inputs/shtest-define/value-escaped.txt
@@ -10,4 +10,7 @@
 # RUN: echo '%{escape}'
 # CHECK:# | \n \g<param>
 
+# RUN: echo '%{global:subst-with-escapes}'
+# CHECK:# | value-with-\g
+
 # CHECK: Passed: 1 {{\([0-9]*.[0-9]*%\)}}

>From ef8de00acda0dd133e95560b13c582f447caa3de Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Martin=20Storsj=C3=B6?= <martin at martin.st>
Date: Wed, 21 Aug 2024 23:18:24 +0300
Subject: [PATCH 3/3] Update llvm/utils/lit/tests/Inputs/shtest-define/lit.cfg

Co-authored-by: Joel E. Denny <jdenny.ornl at gmail.com>
---
 llvm/utils/lit/tests/Inputs/shtest-define/lit.cfg | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/llvm/utils/lit/tests/Inputs/shtest-define/lit.cfg b/llvm/utils/lit/tests/Inputs/shtest-define/lit.cfg
index 88c956b52d7892..476240e744afbc 100644
--- a/llvm/utils/lit/tests/Inputs/shtest-define/lit.cfg
+++ b/llvm/utils/lit/tests/Inputs/shtest-define/lit.cfg
@@ -22,6 +22,9 @@ config.substitutions.insert(0, ("%{global:greeting}", ""))
 config.substitutions.insert(
     0, ("%{global:echo}", "echo GLOBAL: %{global:greeting} %{global:what}")
 )
+
+# This substitution includes an re.sub replacement string escape sequence, 
+# which lit should treat as plain text.
 config.substitutions.insert(0, ("%{global:subst-with-escapes}", r"value-with-\g"))
 
 # The following substitution definitions are confusing and should be avoided.



More information about the llvm-commits mailing list