[llvm] r373700 - [test] Remove locale dependency for mri-utf8.test

Thomas Preud'homme via llvm-commits llvm-commits at lists.llvm.org
Fri Oct 4 00:13:47 PDT 2019


Author: thopre
Date: Fri Oct  4 00:13:46 2019
New Revision: 373700

URL: http://llvm.org/viewvc/llvm-project?rev=373700&view=rev
Log:
[test] Remove locale dependency for mri-utf8.test

Summary:
llvm-ar's mri-utf8.test test relies on the en_US.UTF-8 locale to be
installed for its last RUN line to work. If not installed, the unicode
string gets encoded (interpreted) as ascii which fails since the most
significant byte is non zero. This commit changes the call to open to
use a binary literal of the UTF-8 encoding for the pound sign instead,
thus bypassing the encoding step.

Note that the echo to create the <pound sign>.txt file will work
regardless of the locale because both the shell and the echo (in case
it's not a builtin of the shell concerned) only care about ascii
character to operate. Indeed, the mri-utf8.test file (and in particular
the pound sign) is encoded in UTF-8 and UTF-8 guarantees only ascii
characters can create bytes that can be interpreted as ascii characters
(i.e. bytes with the most significant bit null).

So the process to break down the filename in the line goes something
along:
- find an ascii chevron '>'
- find beginning of the filename by removing ascii space-like characters
- find ascii newline character indicating the end of the redirection (no
  semicolon ';', closing curly bracket '}' or parenthesis ')' or the
  like
- create a file whose name is made of all the bytes in between beginning
  and end of filename *without interpretting them*

Reviewers: gbreynoo, MaskRay, rupprecht, JamesNagurne, jfb

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68418

Modified:
    llvm/trunk/test/tools/llvm-ar/mri-utf8.test

Modified: llvm/trunk/test/tools/llvm-ar/mri-utf8.test
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/tools/llvm-ar/mri-utf8.test?rev=373700&r1=373699&r2=373700&view=diff
==============================================================================
--- llvm/trunk/test/tools/llvm-ar/mri-utf8.test (original)
+++ llvm/trunk/test/tools/llvm-ar/mri-utf8.test Fri Oct  4 00:13:46 2019
@@ -12,8 +12,4 @@ RUN: echo "SAVE" >> %t/script.mri
 RUN: llvm-ar -M < %t/script.mri
 RUN: cd %t/extracted && llvm-ar x %t/mri.ar
 
-# This works around problems launching processess that
-# include arguments with non-ascii characters.
-# Python on Linux defaults to ASCII encoding unless the
-# environment specifies otherwise, so it is explicitly set.
-RUN: env LANG=en_US.UTF-8 %python -c "assert open(u'\U000000A3.txt', 'rb').read() == b'contents\n'"
+RUN: %python -c "assert open(b'\xC2\xA3.txt', 'rb').read() == b'contents\n'"




More information about the llvm-commits mailing list