[PATCH] Get all localization tests passing on linux!

Thu Aug 14 20:43:45 PDT 2014

I think I may have misled you when I said we should #ifdef the differences between glibc and Mac. If there are legitimate differences, we should #ifdef them. If glibc is wrong (it looks like it often is), we should just XFAIL the test and file a bug against glibc (or does that data come from an OS package?).

I usually only commented on one test of a given type, but the comments apply to all similar.

================
Comment at: test/localization/locale.categories/category.ctype/locale.ctype.byname/narrow_1.pass.cpp:11
@@ -10,3 +10,3 @@
 // REQUIRES: locale.en_US.UTF-8
-// REQUIRES: locale.fr_CA.UTF-8
+// REQUIRES: locale.fr_CA.ISO8859-1
 
----------------
Doh! My mistake.

================
Comment at: test/localization/locale.categories/category.ctype/locale.ctype.byname/tolower_1.pass.cpp:38
@@ -37,3 +37,3 @@
             assert(f.tolower('1') == '1');
-            assert(f.tolower('\xDA') == '\xFA');
+            //assert(f.tolower('\xDA') == '\xFA');
             assert(f.tolower('\xFA') == '\xFA');
----------------
I think you meant to #ifdef this one as you did in the next test.

Either way, this doesn't look right to me. Do we know why this is happening? Is this a legitimate difference between Mac and glibc, or is glibc broken? If it's just a difference, then we should still check against the expected glibc behavior. If Linux is broken, we should XFAIL the test rather than skipping the broken part.

================
Comment at: test/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_ru_RU.pass.cpp:11
@@ -10,2 +10,3 @@
 // XFAIL: apple-darwin
+// XFAIL: linux
 
----------------
We're XFAILing these as a TODO, right? I wish LIT had a way to distinguish between intended failures and known failures.

================
Comment at: test/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_zh_CN.pass.cpp:311
@@ +310,3 @@
+#       else
+            std::string v = "-CNY 0.01";
+#       endif
----------------
I think glibc's locale data is wrong here. We should mark these tests as XFAIL rather than hiding the failure. A bug will need to be filed against glibc (or wherever glibc gets their locale data from).

================
Comment at: test/localization/locale.categories/category.monetary/locale.moneypunct.byname/decimal_point.pass.cpp:133
@@ +132,3 @@
+#   else
+        assert(f.decimal_point() == L'.');
+#   endif
----------------
The test is right, glibc's locale data is wrong. Russia uses ',' for a decimal separator (according to wikipedia). This test (and those like it) should be XFAIL.

================
Comment at: test/localization/locale.categories/category.time/locale.time.get.byname/get_date.pass.cpp:63
@@ +62,3 @@
+#   else
+        const char in[] = "10/06/2009";
+#   endif
----------------
Looks like more bad locale data to me.

================
Comment at: test/localization/locale.categories/category.time/locale.time.get.byname/get_date.pass.cpp:79
@@ +78,3 @@
+#   else
+        const char in[] = "10" "\x2E" "06" "\x2E" "2009";
+#   endif
----------------
What?

================
Comment at: test/localization/locale.categories/category.time/locale.time.get.byname/get_date.pass.cpp:92
@@ +91,3 @@
+    // When I use python to decode the unicode characters in /usr/share/i18n/zh_CN
+    // It throws an exception. I think there is a bug in the GLIBC locale.
+#if !defined(__GLIBC__)
----------------
Looks like it wouldn't be the first time :)

================
Comment at: test/localization/locale.categories/category.time/locale.time.get.byname/get_date_wide.pass.cpp:86
@@ -81,1 +85,3 @@
     }
+    // I just can't get this to pass on linux.
+#if !defined(__GLIBC__)
----------------
Then let's XFAIL it for now.

================
Comment at: test/localization/locale.categories/category.time/locale.time.get.byname/get_one.pass.cpp:47
@@ -45,1 +46,3 @@
+    // This test is not portible on linux.
+#if !defined(__GLIBC__)
     {
----------------
There should be an #else case that tests the Linux behavior.

================
Comment at: test/localization/locale.categories/category.time/locale.time.get.byname/get_one.pass.cpp:70
@@ +69,3 @@
+#   else
+        const char in[] = "11:55:59 PM";
+#   endif
----------------
Pretty sure this is more bad locale data.

================
Comment at: test/localization/locale.categories/category.time/locale.time.get.byname/get_one.pass.cpp:117
@@ -104,2 +116,2 @@
                           "\xD0\xBE\xD1\x82\xD0\xB0"
                           ", 31 "
----------------
Phabricator: "Context not available"
Me: Damn you, Phabricator!

*checks the source*

Well, that didn't make it any clearer. I presume that mess of hex is encoding cyrillic characters. Are these tests like the other Russian ones where the separators are wrong? If so, XFAIL, not #ifdef.

================
Comment at: test/localization/locale.categories/category.time/locale.time.put.byname/put1.pass.cpp:73
@@ -72,1 +72,3 @@
+        // w/ GLIBC fr_FR.UTF-8 day abreviations can end with a '.'
         assert((ex == "Today is Samedi which is abbreviated Sam.")||
+               (ex == "Today is samedi which is abbreviated sam." )||
----------------
What's with the case differences? I see that was actually already the case, but why?

================
Comment at: test/localization/locale.categories/facet.numpunct/locale.numpunct.byname/thousands_sep.pass.cpp:60
@@ +59,3 @@
+#       else
+            assert(np.thousands_sep() == ' ');
+#       endif
----------------
Looks like more glibc wrongness.

http://reviews.llvm.org/D4861