[PATCH] D84233: [lit] Escape ANSI control character in xunit output

Joel E. Denny via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Jul 24 11:04:48 PDT 2020


jdenny added a comment.

Thanks for working on this.  I've been wanting to see this fixed too.  I agree that lit is the culprit.

In D84233#2171536 <https://reviews.llvm.org/D84233#2171536>, @arichardson wrote:

> Maybe specifying version 1.1 for the XUnit output would make the Java parsers happy again, but escaping ANSI control characters might also be useful if you open the report XML file in a text editor.


I'm not sure which fix is best:

- The current patch: If I understand correctly, the current patch's escaping for XML 1.0 is not reversible: the escape sequences are indistinguishable from any existing occurrences of the two-character sequence `\x`.
- Drop control characters instead: That would make the output more readable.  It loses even more information, but that information's value is questionable.
- Use a reversible escape sequence: This doesn't seem worth it and might make simple output harder to read.
- XML 1.1: It looks like that would avoid the above issues (I think it's best to just drop nul bytes).  Does anyone know how widely supported XML 1.1 is today?  A quick google search turned up info from a decade ago saying it was not widely supported then.

If we cannot come to a conclusion soon, I think the current patch is an improvement, so we should commit it and then consider possible improvements.



================
Comment at: llvm/utils/lit/lit/reports.py:72
+def escape_control_chars(s):
+    escape_dict = {chr(c): '\\x' + hex(c)[2:] for c in range(32) if chr(c) not in ('\t', '\n', '\r')}
+    try:
----------------
Please add a comment here citing the relevant part of the XML 1.0 spec.


================
Comment at: llvm/utils/lit/lit/reports.py:136
+            # contains control characters like \x1b (e.g. if there is some
+            # -fcolor-diagnostics output).
+            output = escape_control_chars(output)
----------------
I had similar trouble with Gitlab CI around the end of last year.  Maybe the main point this comment should make is that the XML is invalid without this escaping, and then the comment can mention CI failures as a practical consequence.


================
Comment at: llvm/utils/lit/tests/shtest-format.py:35
+# CHECK-NEXT: a line with {{.*}}control characters{{.*}}.
+# characters{{[:cntrl:].+}}.
+# CHECK: --
----------------
Is this line intentional?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D84233/new/

https://reviews.llvm.org/D84233





More information about the llvm-commits mailing list