[PATCH] D34464: lit: Make sure testnames are unicode strings
David L. Jones via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Jun 29 00:22:53 PDT 2017
dlj added inline comments.
================
Comment at: utils/lit/lit/Test.py:232
+ # as str objects when they don't decode cleanly.
+ if sys.version_info < (3,0,0):
+ testpath = testpath.decode(sys.getfilesystemencoding(),
----------------
MatzeB wrote:
> chapuni wrote:
> > Could you make it work without checking version?
> > (I haven't tested yet)
> No I could not. It seems on python2 you get `str` objects back from listdir() sometimes, while on python3 you always get a (unicode) string back. Calling decode on a string in python3 is not allowed.
Python3's listdir can return bytes as well, if a bytes object was passed as the path.
The simplest path forward is probably something like this:
testpath = os.pathsep.join(self.path_in_suite)
if isinstance(testpath, bytes):
testpath = testpath.decode(sys.getfilesystemencoding(), 'surrogateescape')
return ...
That should always give you a Unicode-like object (Py2 unicode or Py3 str). The surrogateescape behaviour matches how Py3 handles odd filenames:
https://github.com/python/cpython/blob/6f0eb93183519024cb360162bdd81b9faec97ba6/Python/bltinmodule.c#L35
It's a subtle point, but by using Py3's listdir, the filenames could contain non-utf-8-encodable Unicode values. That means that, in any case, whatever prints the filename to the terminal needs to guard against a UnicodeEncodeError (at a minimum, in case of surrogates in the filename).
Repository:
rL LLVM
https://reviews.llvm.org/D34464
More information about the llvm-commits
mailing list