[PATCH] D68472: [test] Depend on C.UTF-8 dependency for mri-utf8.test

Thomas Preud'homme via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Oct 4 10:19:53 PDT 2019


thopre created this revision.
thopre added reviewers: gbreynoo, MaskRay, rupprecht, JamesNagurne, jfb.
thopre added a project: LLVM.
Herald added a subscriber: dexonsmith.

llvm-ar's mri-utf8.test test relies on the en_US.UTF-8 locale to be
installed for its last RUN line to work. If not installed, the unicode
string gets encoded (interpreted) as ascii which fails since the most
significant byte is non zero. This commit changes the test to rely on
the C.UTF-8 locale instead as it is more likely to be present. As an
example the Ubuntu 18.04 docker image does not come with en_US.UTF-8
but does come with C.UTF-8.

Note that the echo to create the <pound sign>.txt file works because the
mri-utf8.test file is encoded in UTF-8 and the RUN line interpreted by
lit and thus Python. In particular, the file in the redirection will be
stored in a unicode string variable a file with that name created on the
file system according to the locale the system uses (e.g. UTF-16 for
Windows, and the locale in LANG for Linux). Likewise, the open on the
last RUN line will use the locale used by the system to encode the
filename before asking the OS to open it. This solves the Windows
buildbot breakage which was due to trying to read an UTF-8 encoded pound
when the file created was UTF-16 encoded as per explanation above.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D68472

Files:
  llvm/test/tools/llvm-ar/mri-utf8.test


Index: llvm/test/tools/llvm-ar/mri-utf8.test
===================================================================
--- llvm/test/tools/llvm-ar/mri-utf8.test
+++ llvm/test/tools/llvm-ar/mri-utf8.test
@@ -16,8 +16,11 @@
 # include arguments with non-ascii characters.
 # Python on Linux defaults to ASCII encoding unless the
 # environment specifies otherwise, so it is explicitly set.
+# The locale chosen is C.UTF-8 as being the most likely to
+# be available on a system, in particular on a minimal
+# system like a docker image.
 # The reliance the test has on this locale is not ideal,
 # however alternate solutions have been difficult due to 
 # behaviour differences with python 2 vs python 3,
-# and linux vs windows.
-RUN: env LANG=en_US.UTF-8 %python -c "assert open(u'\U000000A3.txt', 'rb').read() == b'contents\n'"
+# and linux vs windows. The C.UTF-8 locale is chosen
+RUN: env LANG=C.UTF-8 %python -c "assert open(u'\xA3.txt', 'rb').read() == b'contents\n'"


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D68472.223244.patch
Type: text/x-patch
Size: 969 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20191004/ad795e08/attachment.bin>


More information about the llvm-commits mailing list