[PATCH] D68472: [test] Depend on C.UTF-8 dependency for mri-utf8.test
Thomas Preud'homme via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Oct 7 02:17:43 PDT 2019
thopre marked an inline comment as done.
thopre added a comment.
In D68472#1695743 <https://reviews.llvm.org/D68472#1695743>, @hubert.reinterpretcast wrote:
> In D68472#1695156 <https://reviews.llvm.org/D68472#1695156>, @thopre wrote:
>
> > I believe that's because it is a builtin locale (much like the C locale). There wasn't one in /usr/share/locale on the Ubuntu docker image I've tested this but it did work while trying with en_US.UTF-8 did not.
>
>
> I mean that using `"C.UTF-8"` with `setlocale` gets me a null pointer, and using `"en_US.UTF-8"` gets me a string with the following:
>
> #include <locale.h>
> extern int printf(const char *, ...);
> void trylocale(const char *locale) {
> const char *ret = setlocale(LC_ALL, locale);
> printf("setlocale(\"%s\") returned \"%s\".\n", locale, ret ? ret : "(null)");
> }
> int main(void) {
> trylocale("C.UTF-8");
> trylocale("en_US.UTF-8");
> }
>
>
> On AIX:
>
> setlocale("C.UTF-8") returned "(null)".
> setlocale("en_US.UTF-8") returned "en_US.UTF-8 en_US.UTF-8 en_US.UTF-8 en_US.UTF-8 en_US.UTF-8 en_US.UTF-8".
>
>
> On RHEL 7:
>
> setlocale("C.UTF-8") returned "(null)".
> setlocale("en_US.UTF-8") returned "en_US.UTF-8".
>
Mmh so much for C.UTF-8 then. Good thing Fangrui Song came up with a better idea.
================
Comment at: llvm/test/tools/llvm-ar/mri-utf8.test:26
+# and linux vs windows. The C.UTF-8 locale is chosen
+RUN: env LANG=C.UTF-8 %python -c "assert open(u'\xA3.txt', 'rb').read() == b'contents\n'"
----------------
hubert.reinterpretcast wrote:
> MaskRay wrote:
> > Just delete the comments and avoid python.
> >
> > ```
> > RUN: FileCheck --input-file £.txt --match-full-lines
> > CHECK: contents
> > ```
> As it is, the file contains nothing aside from this last RUN line and its associated comment block that indicates that U+00A3 is the intended interpretation of the bytes `\xC2\xA3`. Note: There is no BOM in the file.
>
> In addition to making the intent clear, I believe that the current approach has more of an ability to detect cases where the instances of `\xC2\xA3` in the file are misinterpreted.
>
> That said, if the file redirection to create the file works, then `FileCheck` can be invoked with use of file redirection:
> ```
> RUN: FileCheck <£.txt --match-full-lines %s
> ```
>
Is UTF-8 encoding really the desired behavior or just non ascii? I know the test is named mri-utf8 but the first comment says "Test non-ascii archive members". Besides as I mentioned in the patch description Windows encodes it in UTF-16 so UTF-8 is already not possible there.
I do like the approach of using FileCheck with an input redirection. It is consistent with the echo line above so if one works the other one will as well. I feel ashamed I didn't think of that good old FileCheck. I'll revise the patch accordingly.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D68472/new/
https://reviews.llvm.org/D68472
More information about the llvm-commits
mailing list