[PATCH] D68472: [test] Depend on C.UTF-8 dependency for mri-utf8.test

Mon Oct 7 02:17:43 PDT 2019

thopre marked an inline comment as done.
thopre added a comment.

In D68472#1695743 <https://reviews.llvm.org/D68472#1695743>, @hubert.reinterpretcast wrote:

> In D68472#1695156 <https://reviews.llvm.org/D68472#1695156>, @thopre wrote:
>
> > I believe that's because it is a builtin locale (much like the C locale). There wasn't one in /usr/share/locale on the Ubuntu docker image I've tested this but it did work while trying with en_US.UTF-8 did not.
>
>
> I mean that using `"C.UTF-8"` with `setlocale` gets me a null pointer, and using `"en_US.UTF-8"` gets me a string with the following:
>
>   #include <locale.h>
>   extern int printf(const char *, ...);
>   void trylocale(const char *locale) {
>     const char *ret = setlocale(LC_ALL, locale);
>     printf("setlocale(\"%s\") returned \"%s\".\n", locale, ret ? ret : "(null)");
>   }
>   int main(void) {
>     trylocale("C.UTF-8");
>     trylocale("en_US.UTF-8");
>   }
>
>
> On AIX:
>
>   setlocale("C.UTF-8") returned "(null)".
>   setlocale("en_US.UTF-8") returned "en_US.UTF-8 en_US.UTF-8 en_US.UTF-8 en_US.UTF-8 en_US.UTF-8 en_US.UTF-8".
>
>
> On RHEL 7:
>
>   setlocale("C.UTF-8") returned "(null)".
>   setlocale("en_US.UTF-8") returned "en_US.UTF-8".
>

Mmh so much for C.UTF-8 then. Good thing Fangrui Song came up with a better idea.

================
Comment at: llvm/test/tools/llvm-ar/mri-utf8.test:26
+# and linux vs windows. The C.UTF-8 locale is chosen
+RUN: env LANG=C.UTF-8 %python -c "assert open(u'\xA3.txt', 'rb').read() == b'contents\n'"
----------------
hubert.reinterpretcast wrote:
> MaskRay wrote:
> > Just delete the comments and avoid python.
> > 
> > ```
> > RUN: FileCheck --input-file £.txt --match-full-lines
> > CHECK: contents
> > ```
> As it is, the file contains nothing aside from this last RUN line and its associated comment block that indicates that U+00A3 is the intended interpretation of the bytes `\xC2\xA3`. Note: There is no BOM in the file.
> 
> In addition to making the intent clear, I believe that the current approach has more of an ability to detect cases where the instances of `\xC2\xA3` in the file are misinterpreted.
> 
> That said, if the file redirection to create the file works, then `FileCheck` can be invoked with use of file redirection:
> ```
> RUN: FileCheck <£.txt --match-full-lines %s
> ```
> 
Is UTF-8 encoding really the desired behavior or just non ascii? I know the test is named mri-utf8 but the first comment says "Test non-ascii archive members". Besides as I mentioned in the patch description Windows encodes it in UTF-16 so UTF-8 is already not possible there.

I do like the approach of using FileCheck with an input redirection. It is consistent with the echo line above so if one works the other one will as well. I feel ashamed I didn't think of that good old FileCheck. I'll revise the patch accordingly.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D68472/new/

https://reviews.llvm.org/D68472