[PATCH] D34464: lit: Make sure testnames are unicode strings

David L. Jones via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Jun 29 00:22:53 PDT 2017


dlj added inline comments.


================
Comment at: utils/lit/lit/Test.py:232
+        # as str objects when they don't decode cleanly.
+        if sys.version_info < (3,0,0):
+            testpath = testpath.decode(sys.getfilesystemencoding(),
----------------
MatzeB wrote:
> chapuni wrote:
> > Could you make it work without checking version?
> > (I haven't tested yet)
> No I could not. It seems on python2 you get `str` objects back from listdir() sometimes, while on python3 you always get a (unicode) string back. Calling decode on a string in python3 is not allowed.
Python3's listdir can return bytes as well, if a bytes object was passed as the path.

The simplest path forward is probably something like this:

  testpath = os.pathsep.join(self.path_in_suite)
  if isinstance(testpath, bytes):
    testpath = testpath.decode(sys.getfilesystemencoding(), 'surrogateescape')
  return ...

That should always give you a Unicode-like object (Py2 unicode or Py3 str). The surrogateescape behaviour matches how Py3 handles odd filenames:
https://github.com/python/cpython/blob/6f0eb93183519024cb360162bdd81b9faec97ba6/Python/bltinmodule.c#L35

It's a subtle point, but by using Py3's listdir, the filenames could contain non-utf-8-encodable Unicode values. That means that, in any case, whatever prints the filename to the terminal needs to guard against a UnicodeEncodeError (at a minimum, in case of surrogates in the filename).


Repository:
  rL LLVM

https://reviews.llvm.org/D34464





More information about the llvm-commits mailing list