[PATCH] D34464: lit: Make sure testnames are unicode strings
Matthias Braun via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Jun 22 11:19:01 PDT 2017
MatzeB added a comment.
In https://reviews.llvm.org/D34464#787900, @chapuni wrote:
> I have reproduced the issue, and confirmed this works.
>
> Yet another option; Could we make ProgressBar accept unicode?
> The difference is; sys.stdout.write() doesn't accept unicode (by default) but print() accepts.
I'm still trying to really understand the underlying issue/how exactly python works here.
The thing is doing `sys.stdout.write(u'Hällö\n')` works perfectly fine as python knows how to interpret the unicode object. As far as I understand it, what we have here is that the filename is rather a sequence of bytes and python refuses to simply interpret that byte string as `utf-8`. Even a stream of bytes can still be printed perfectly fine to stdout. What breaks in the ProgressBar is that we try to concatenate a unicode string with a string of bytes. That's the point where python needs to know the encoding or it can't create a unicode string for the result.
This illustrates it I think:
import sys
bla = 'H\xc3\xa4ll\xc3\xb6\n'
# All of these work
sys.stdout.write(u"I say: ")
sys.stdout.write(bla)
sys.stdout.write(u"I say: " + bla.decode('utf-8'))
# This fails
sys.stdout.write(u"I say: " + bla)
If you want to solve the problem in ProgressBar then we have to make sure we do not concatenate strings to much and rather use separate write calls. However I rather prefer the solution here where we decode the filename to a unicode object upfront.
Repository:
rL LLVM
https://reviews.llvm.org/D34464
More information about the llvm-commits
mailing list