[PATCH] D34464: lit: Make sure testnames are unicode strings

Thu Jun 22 11:19:01 PDT 2017

MatzeB added a comment.

In https://reviews.llvm.org/D34464#787900, @chapuni wrote:

> I have reproduced the issue, and confirmed this works.
>
> Yet another option; Could we make ProgressBar accept unicode?
>  The difference is; sys.stdout.write() doesn't accept unicode (by default) but print() accepts.

I'm still trying to really understand the underlying issue/how exactly python works here.

The thing is doing `sys.stdout.write(u'Hällö\n')` works perfectly fine as python knows how to interpret the unicode object. As far as I understand it, what we have here is that the filename is rather a sequence of bytes and python refuses to simply interpret that byte string as `utf-8`. Even a stream of bytes can still be printed perfectly fine to stdout. What breaks in the ProgressBar is that we try to concatenate a unicode string with a string of bytes. That's the point where python needs to know the encoding or it can't create a unicode string for the result.

This illustrates it I think:

  import sys
  bla = 'H\xc3\xa4ll\xc3\xb6\n'
  # All of these work
  sys.stdout.write(u"I say: ")
  sys.stdout.write(bla)
  sys.stdout.write(u"I say: " + bla.decode('utf-8'))
  # This fails
  sys.stdout.write(u"I say: " + bla)

If you want to solve the problem in ProgressBar then we have to make sure we do not concatenate strings to much and rather use separate write calls. However I rather prefer the solution here where we decode the filename to a unicode object upfront.

Repository:
  rL LLVM

https://reviews.llvm.org/D34464