[PATCH] D84233: [lit] Escape ANSI control character in xunit output

Alexander Richardson via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Jul 24 01:32:29 PDT 2020


arichardson added a comment.

In D84233#2170305 <https://reviews.llvm.org/D84233#2170305>, @yln wrote:

> @arichardson: can you double-check that this workaround is still needed?
>  Do we understand the semantics of CDATA blocks?  I was under the impression we use it here to avoid problems like this.
>
> Anyways, I am fine with this.  Adding Joel as a second reviewer to get his feedback before accepting.


I believe CDATA just avoids the need for escape XML special characters. However, characters 0-0x20 (with the exception of \t \r an \n are not valid anywhere in the document according to the XML spec (https://www.w3.org/TR/xml/#charsets):
Char	   ::=   	#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]	/* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */

XML 1.1 seems to relax that an allow everything except NUL: https://www.w3.org/TR/xml11/#charsets

Maybe specifying version 1.1 for the XUnit output would make the Java parsers happy again, but escaping ANSI control characters might also be useful if you open the report XML file in a text editor.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D84233/new/

https://reviews.llvm.org/D84233





More information about the llvm-commits mailing list