[PATCH] D68472: [test] Depend on C.UTF-8 dependency for mri-utf8.test

Hubert Tong via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Oct 7 06:47:46 PDT 2019


hubert.reinterpretcast added inline comments.


================
Comment at: llvm/test/tools/llvm-ar/mri-nonascii.test:6
+
+RUN: echo "contents" > %t/£.txt
+
----------------
I am not particularly thrilled with having a file containing non-ASCII characters that are ambiguous with regards to their interpretation. Is this `£`, `Β£`, or something else? Is there an objection to adding a BOM?



================
Comment at: llvm/test/tools/llvm-ar/mri-nonascii.test:15
+
+# Use input redirection to work around problems launching processess that
+# include arguments with non-ascii characters.
----------------
Minor nit: s/processess/processes/;


================
Comment at: llvm/test/tools/llvm-ar/mri-utf8.test:26
+# and linux vs windows. The C.UTF-8 locale is chosen
+RUN: env LANG=C.UTF-8 %python -c "assert open(u'\xA3.txt', 'rb').read() == b'contents\n'"
----------------
thopre wrote:
> hubert.reinterpretcast wrote:
> > MaskRay wrote:
> > > Just delete the comments and avoid python.
> > > 
> > > ```
> > > RUN: FileCheck --input-file £.txt --match-full-lines
> > > CHECK: contents
> > > ```
> > As it is, the file contains nothing aside from this last RUN line and its associated comment block that indicates that U+00A3 is the intended interpretation of the bytes `\xC2\xA3`. Note: There is no BOM in the file.
> > 
> > In addition to making the intent clear, I believe that the current approach has more of an ability to detect cases where the instances of `\xC2\xA3` in the file are misinterpreted.
> > 
> > That said, if the file redirection to create the file works, then `FileCheck` can be invoked with use of file redirection:
> > ```
> > RUN: FileCheck <£.txt --match-full-lines %s
> > ```
> > 
> Is UTF-8 encoding really the desired behavior or just non ascii? I know the test is named mri-utf8 but the first comment says "Test non-ascii archive members". Besides as I mentioned in the patch description Windows encodes it in UTF-16 so UTF-8 is already not possible there.
> 
> I do like the approach of using FileCheck with an input redirection. It is consistent with the echo line above so if one works the other one will as well. I feel ashamed I didn't think of that good old FileCheck. I'll revise the patch accordingly.
The description in the Windows case indicates that the file that ends up on the filesystem is named, in terms of what a user might see in a directory listing, `£.txt` (as opposed to something else).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D68472/new/

https://reviews.llvm.org/D68472





More information about the llvm-commits mailing list