[PATCH] Get all localization tests passing on linux!

Sun Aug 17 17:01:41 PDT 2014

Hi Dan,

I've addressed most of your comments. Thanks for the input. 
I'll post a revised patch closely after.

================
Comment at: test/localization/locale.categories/category.ctype/locale.ctype.byname/tolower_1.pass.cpp:38
@@ -37,3 +37,3 @@
             assert(f.tolower('1') == '1');
-            assert(f.tolower('\xDA') == '\xFA');
+            //assert(f.tolower('\xDA') == '\xFA');
             assert(f.tolower('\xFA') == '\xFA');
----------------
Dan Albert wrote:
> I think you meant to #ifdef this one as you did in the next test.
> 
> Either way, this doesn't look right to me. Do we know why this is happening? Is this a legitimate difference between Mac and glibc, or is glibc broken? If it's just a difference, then we should still check against the expected glibc behavior. If Linux is broken, we should XFAIL the test rather than skipping the broken part.
FreeBSD also fails on these tests as well for the same reason and I suspect darwin does as well. This needs more investigation.

================
Comment at: test/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_ru_RU.pass.cpp:11
@@ -10,2 +10,3 @@
 // XFAIL: apple-darwin
+// XFAIL: linux
 
----------------
Dan Albert wrote:
> We're XFAILing these as a TODO, right? I wish LIT had a way to distinguish between intended failures and known failures.
Yep. I'll add a comment saying its TODO and explain the likely cause (translation of U00A0).

================
Comment at: test/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_zh_CN.pass.cpp:311
@@ +310,3 @@
+#       else
+            std::string v = "-CNY 0.01";
+#       endif
----------------
Dan Albert wrote:
> I think glibc's locale data is wrong here. We should mark these tests as XFAIL rather than hiding the failure. A bug will need to be filed against glibc (or wherever glibc gets their locale data from).
GLIBC's locale definition doesn't seem that unreasonable. I'll look more into it but I don't know if we can say its test data is for sure wrong.
I'm going to hold off on changing this.

================
Comment at: test/localization/locale.categories/category.monetary/locale.moneypunct.byname/decimal_point.pass.cpp:133
@@ +132,3 @@
+#   else
+        assert(f.decimal_point() == L'.');
+#   endif
----------------
Dan Albert wrote:
> The test is right, glibc's locale data is wrong. Russia uses ',' for a decimal separator (according to wikipedia). This test (and those like it) should be XFAIL.
Yep. That seems wrong. I'll back out the GLIBC changes and just mark it as XFAIL. I'll also file a bug with GLIBC.

================
Comment at: test/localization/locale.categories/category.time/locale.time.get.byname/get_date.pass.cpp:63
@@ +62,3 @@
+#   else
+        const char in[] = "10/06/2009";
+#   endif
----------------
Dan Albert wrote:
> Looks like more bad locale data to me.
I think this might be bad locale data on apples part as opposed to GLIBCs. I'm basing that off
http://www.localeplanet.com/icu/fr-FR/ 
http://en.wikipedia.org/wiki/Date_format_by_country




================
Comment at: test/localization/locale.categories/category.time/locale.time.get.byname/get_date.pass.cpp:79
@@ +78,3 @@
+#   else
+        const char in[] = "10" "\x2E" "06" "\x2E" "2009";
+#   endif
----------------
Dan Albert wrote:
> What?
GLIBC's locale data says that U002E is used as the separator. 
http://www.fileformat.info/info/unicode/char/2e/index.htm 

I don't think that seems unreasonable. More investigation needed though.

================
Comment at: test/localization/locale.categories/category.time/locale.time.get.byname/get_date_wide.pass.cpp:86
@@ -81,1 +85,3 @@
     }
+    // I just can't get this to pass on linux.
+#if !defined(__GLIBC__)
----------------
Dan Albert wrote:
> Then let's XFAIL it for now.
Ok. Done.

================
Comment at: test/localization/locale.categories/category.time/locale.time.get.byname/get_one.pass.cpp:47
@@ -45,1 +46,3 @@
+    // This test is not portible on linux.
+#if !defined(__GLIBC__)
     {
----------------
Dan Albert wrote:
> There should be an #else case that tests the Linux behavior.
Not quite sure how to write one. The output depends on the timezone your in.

================
Comment at: test/localization/locale.categories/category.time/locale.time.get.byname/get_one.pass.cpp:70
@@ +69,3 @@
+#   else
+        const char in[] = "11:55:59 PM";
+#   endif
----------------
Dan Albert wrote:
> Pretty sure this is more bad locale data.
Probably, but I think GLIBC is correct. Having 12 hour time for en_US seems to make more sense than 24 hour.

================
Comment at: test/localization/locale.categories/category.time/locale.time.get.byname/get_one.pass.cpp:117
@@ -104,2 +116,2 @@
                           "\xD0\xBE\xD1\x82\xD0\xB0"
                           ", 31 "
----------------
Dan Albert wrote:
> Phabricator: "Context not available"
> Me: Damn you, Phabricator!
> 
> *checks the source*
> 
> Well, that didn't make it any clearer. I presume that mess of hex is encoding cyrillic characters. Are these tests like the other Russian ones where the separators are wrong? If so, XFAIL, not #ifdef.
Done. Changed backed out. I suspect the difference is between upper and lower case cyrillic characters. However more investigation is needed.

================
Comment at: test/localization/locale.categories/category.time/locale.time.put.byname/put1.pass.cpp:73
@@ -72,1 +72,3 @@
+        // w/ GLIBC fr_FR.UTF-8 day abreviations can end with a '.'
         assert((ex == "Today is Samedi which is abbreviated Sam.")||
+               (ex == "Today is samedi which is abbreviated sam." )||
----------------
Dan Albert wrote:
> What's with the case differences? I see that was actually already the case, but why?
Not sure. glibc uses lower case. I've seperated the glibc change and only check the lower case version.

================
Comment at: test/localization/locale.categories/facet.numpunct/locale.numpunct.byname/thousands_sep.pass.cpp:60
@@ +59,3 @@
+#       else
+            assert(np.thousands_sep() == ' ');
+#       endif
----------------
Dan Albert wrote:
> Looks like more glibc wrongness.
Perhaps. Some of the sources I've found online suggest that ' ' is valid.
http://docs.oracle.com/cd/E19455-01/806-0169/overview-9/index.html
I'll leave this in for now.

http://reviews.llvm.org/D4861