[PATCH] D34793: [lit] Fix some convoluted logic around Unicode encoding, and de-duplicate across modules that used it.

David L. Jones via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Jun 28 23:11:19 PDT 2017


dlj marked an inline comment as done.
dlj added a comment.

In https://reviews.llvm.org/D34793#794883, @chapuni wrote:

> @dlj Great, thanks!
>
> Seems it also fixes https://reviews.llvm.org/D34464.


Interesting... to_string now has to fall back to str(bytes) in Python3 when there is an invalid input. In that case, the resulting string looks more like the output of repr(), which is not what one would want for a filename.

It's not clear to me why Python's behaviour of treating *filenames* as unicode is actually the right choice.

Strictly speaking, I think the only well-defined filename encoding that covers all platforms targeted by Clang is the one defined by the Posix spec:
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_282

(But of course, our supported OSes do support broader character sets.)

I'll think more about what to_string should do, but I'll also leave a comment on the other review thread.


Repository:
  rL LLVM

https://reviews.llvm.org/D34793





More information about the llvm-commits mailing list