[PATCH] D56754: Add Support for Creating and Deleting Unicode Files and Directories in Lit

Jason Mittertreiner via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Jan 24 10:57:54 PST 2019


jmittert marked an inline comment as done.
jmittert added inline comments.


================
Comment at: utils/lit/lit/util.py:444
+    # convert it to ascii
+    return text if isinstance(text, bytes) else text.encode('utf-8')
----------------
serge-sans-paille wrote:
> jmittert wrote:
> > serge-sans-paille wrote:
> > > Bad news: I've been discussing with a Python Core Dev and he acknowledge this function to be ultra dangerous. We should find a way to have everything right without this tricky call.
> > > 
> > > My take would be to first have the whole script work in unicode mode (i.e. in Python3) and then do the minimal work to have it work on Python2. Does that makes sense to you?
> > Let me check that I understand correctly. Right now, lit uses strings on python3 which are unicode aware, and strings/bytes on python2 which are not unicode aware which causes issues when utf8 characters are passed to os.path on Windows.
> > 
> > What about alternatively adding a to_unicode a la the to_string and to_bytes in util.py which would return str on python3, and unicode on python2?
> > 
> > Then adding something like
> > ```
> > s = some_string
> > if windows:
> >     s = to_unicode(s)
> > os.path.some_func(s)
> > ```
> > before each call to os.path.some_func?
> Your first statement is correct. However I don't understand why there's anything windows specific in there. Let think about it this way: in Python2, strings where used to represent two different kind of objects : raw bytes, and unicode string. They are different types in Python3. So my advice would be:
> 
> 1. have the code work in Python3, inserting bytes to string conversion whenever needed. These conversion should not be platform specific.
> 
> 2. try to run the same code under Python2 and see where it fails.
> 
> Does that look like a reasonable path to you?
Hmm, I was under the impression that that was exactly what I've done here.

Lit works as of now with python3 on both Windows and Unix. The "convertToLocalEncoding" is essentially the fix up for when I ran it in python2 and saw where it failed. In order for the os.path functions to handle unicode properly on python 2, they need to take a `unicode` string on Windows. Otherwise, the os.path functions create/open a file with the literal utf-8 bytes which causes a broken filename for every other application that expects utf-16 on Windows. On Unix, utf-8 means things work fine in python2 as is. Passing a `unicode` type to os.path on unix/python2 gives an error about not being able to decode to ascii,  but just passing bytes works. 

Right now, with python 2 lit passes utf-8 bytes around (the python2 string/byte type) which works for Unix, but needs converting on Windows. What I have right now shouldn't result in any functional changes except for python2 with Windows.


Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D56754/new/

https://reviews.llvm.org/D56754





More information about the llvm-commits mailing list