[cfe-commits] r173622 - PR15067: Don't assert when a UCN appears in a C90 file.

Wed Jan 30 08:36:25 PST 2013

Ah...good point. Richard pointed out to me that we probably shouldn't be using the library functions anyway, since sometimes they're locale-sensitive and sometimes they aren't. We can replace them all with LLVM ASCII-only versions that take a char instead of int.

Do you want to seek out all the uses or should I?

On Jan 29, 2013, at 0:27 , "Benyei, Guy" <guy.benyei at intel.com> wrote:

> Right, but you did change the failing tests, and added Unicode characters (FixIt/fixit-unicode.c, CodeGen\ucn-identifiers.c). 
> Anyhow, the failures occur in debug build only:
> 
>  CRT assert: f:\dd\vctools\crt_bld\self_x86\crt\src\isctype.c(56) : Assertion failed: (unsigned)(c + 1) <= 256
> 
> Apparently this assertion is supposed to check that the value passed to the character classification functions is an actual character. However, since in clang and LLVM there are several places where a signed char is passed to these function - the negative char values are simply passed as negative int values, and the c-style cast above results in some really big numbers, that fail the assertion.
> 
> Thanks
>   Guy
> 
> 
> -----Original Message-----
> From: Jordan Rose [mailto:jordan_rose at apple.com] 
> Sent: Monday, January 28, 2013 19:03
> To: Benyei, Guy; NAKAMURA Takumi
> Cc: cfe-commits at cs.uiuc.edu
> Subject: Re: [cfe-commits] r173622 - PR15067: Don't assert when a UCN appears in a C90 file.
> 
> Hm, I didn't change any of these, and they've been working fine on Takumi's bot for a while now. Takumi, do you have an opinion on these fixes?
> 
> Jordan
> 
> 
> On Jan 28, 2013, at 5:45 , "Benyei, Guy" <guy.benyei at intel.com> wrote:
> 
>> Hi Jordan,
>> Working on Windows/Visual Studio 2010, I get asserts due to the usage of signed chars representing Unicode characters in some library functions expecting int arguments (specifically: isdigit and isalnum). Ultimately, this makes Visual Studio crash while running check-clang.
>> A possible solution would be to cast these char values to unsigned char:
>> 
>> Index: LiteralSupport.cpp
>> ===================================================================
>> --- LiteralSupport.cpp	(revision 173676)
>> +++ LiteralSupport.cpp	(working copy)
>> @@ -458,7 +458,7 @@
>>  // and FP constants (specifically, the 'pp-number' regex), and assumes that
>>  // the byte at "*end" is both valid and not part of the regex.  Because of
>>  // this, it doesn't have to check for 'overscan' in various places.
>> -  assert(!isalnum(*ThisTokEnd) && *ThisTokEnd != '.' && *ThisTokEnd 
>> != '_' &&
>> +  assert(!isalnum((unsigned char)*ThisTokEnd) && *ThisTokEnd != '.' 
>> + && *ThisTokEnd != '_' &&
>>         "Lexer didn't maximally munch?");
>> 
>>  s = DigitsBegin = ThisTokBegin;
>> 
>> 
>> Index: AsmWriter.cpp
>> ===================================================================
>> --- AsmWriter.cpp	(revision 173676)
>> +++ AsmWriter.cpp	(working copy)
>> @@ -117,7 +117,7 @@
>>  }
>> 
>>  // Scan the name to see if it needs quotes first.
>> -  bool NeedsQuotes = isdigit(Name[0]);
>> +  bool NeedsQuotes = isdigit((unsigned char)Name[0]);
>>  if (!NeedsQuotes) {
>>    for (unsigned i = 0, e = Name.size(); i != e; ++i) {
>>      // By making this unsigned, the value passed in to isalnum will 
>> always be
>> 
>> Anyhow, these are only the specific places the current LITs fail on - I'm really not sure if there are others.
>> 
>> Thanks
>>   Guy
>> 
>> -----Original Message-----
>> From: cfe-commits-bounces at cs.uiuc.edu 
>> [mailto:cfe-commits-bounces at cs.uiuc.edu] On Behalf Of Jordan Rose
>> Sent: Sunday, January 27, 2013 22:12
>> To: cfe-commits at cs.uiuc.edu
>> Subject: [cfe-commits] r173622 - PR15067: Don't assert when a UCN appears in a C90 file.
>> 
>> Author: jrose
>> Date: Sun Jan 27 14:12:04 2013
>> New Revision: 173622
>> 
>> URL: http://llvm.org/viewvc/llvm-project?rev=173622&view=rev
>> Log:
>> PR15067: Don't assert when a UCN appears in a C90 file.
>> 
>> Unfortunately, we can't accept the UCN as an extension because we're required to treat it as two tokens for preprocessing purposes.
>> 
>> Modified:
>>   cfe/trunk/include/clang/Basic/DiagnosticLexKinds.td
>>   cfe/trunk/lib/Lex/Lexer.cpp
>>   cfe/trunk/lib/Lex/LiteralSupport.cpp
>>   cfe/trunk/test/Lexer/c90.c
>> 
>> Modified: cfe/trunk/include/clang/Basic/DiagnosticLexKinds.td
>> URL: 
>> http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Basic/Diag
>> nosticLexKinds.td?rev=173622&r1=173621&r2=173622&view=diff
>> ======================================================================
>> ========
>> --- cfe/trunk/include/clang/Basic/DiagnosticLexKinds.td (original)
>> +++ cfe/trunk/include/clang/Basic/DiagnosticLexKinds.td Sun Jan 27
>> +++ 14:12:04 2013
>> @@ -127,6 +127,11 @@ def warn_cxx98_compat_literal_ucn_escape
>> def warn_cxx98_compat_literal_ucn_control_character : Warning<
>>  "universal character name referring to a control character "
>>  "is incompatible with C++98">, InGroup<CXX98Compat>, DefaultIgnore;
>> +def warn_ucn_not_valid_in_c89 : Warning<
>> +  "universal character names are only valid in C99 or C++; "
>> +  "treating as '\\' followed by identifier">, InGroup<Unicode>; def 
>> +warn_ucn_not_valid_in_c89_literal : ExtWarn<
>> +  "universal character names are only valid in C99 or C++">, 
>> +InGroup<Unicode>;
>> 
>> 
>> // Literal
>> @@ -165,8 +170,6 @@ def ext_string_too_long : Extension<"str
>>  "support">, InGroup<OverlengthStrings>;  def err_character_too_large : Error<
>>  "character too large for enclosing character literal type">; -def 
>> warn_ucn_not_valid_in_c89 : ExtWarn<
>> -  "unicode escape sequences are only valid in C99 or C++">, InGroup<Unicode>;  def warn_cxx98_compat_unicode_literal : Warning<
>>  "unicode literals are incompatible with C++98">,
>>  InGroup<CXX98Compat>, DefaultIgnore;
>> 
>> Modified: cfe/trunk/lib/Lex/Lexer.cpp
>> URL: 
>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Lex/Lexer.cpp?rev=17
>> 3622&r1=173621&r2=173622&view=diff
>> ======================================================================
>> ========
>> --- cfe/trunk/lib/Lex/Lexer.cpp (original)
>> +++ cfe/trunk/lib/Lex/Lexer.cpp Sun Jan 27 14:12:04 2013
>> @@ -2698,8 +2698,6 @@ bool Lexer::isCodeCompletionPoint(const
>> 
>> uint32_t Lexer::tryReadUCN(const char *&StartPtr, const char *SlashLoc,
>>                           Token *Result) {
>> -  assert(LangOpts.CPlusPlus || LangOpts.C99);
>> -
>>  unsigned CharSize;
>>  char Kind = getCharAndSize(StartPtr, CharSize);
>> 
>> @@ -2711,6 +2709,11 @@ uint32_t Lexer::tryReadUCN(const char *&
>>  else
>>    return 0;
>> 
>> +  if (!LangOpts.CPlusPlus && !LangOpts.C99) {
>> +    Diag(SlashLoc, diag::warn_ucn_not_valid_in_c89);
>> +    return 0;
>> +  }
>> +
>>  const char *CurPtr = StartPtr + CharSize;
>>  const char *KindLoc = &CurPtr[-1];
>> 
>> @@ -2737,7 +2740,7 @@ uint32_t Lexer::tryReadUCN(const char *&
>>          }
>>        }
>>      }
>> -      
>> +
>>      return 0;
>>    }
>> 
>> 
>> Modified: cfe/trunk/lib/Lex/LiteralSupport.cpp
>> URL: 
>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Lex/LiteralSupport.c
>> pp?rev=173622&r1=173621&r2=173622&view=diff
>> ======================================================================
>> ========
>> --- cfe/trunk/lib/Lex/LiteralSupport.cpp (original)
>> +++ cfe/trunk/lib/Lex/LiteralSupport.cpp Sun Jan 27 14:12:04 2013
>> @@ -277,7 +277,7 @@ static bool ProcessUCNEscape(const char
>> 
>>  if (!Features.CPlusPlus && !Features.C99 && Diags)
>>    Diag(Diags, Features, Loc, ThisTokBegin, UcnBegin, ThisTokBuf,
>> -         diag::warn_ucn_not_valid_in_c89);
>> +         diag::warn_ucn_not_valid_in_c89_literal);
>> 
>>  return true;
>> }
>> 
>> Modified: cfe/trunk/test/Lexer/c90.c
>> URL: 
>> http://llvm.org/viewvc/llvm-project/cfe/trunk/test/Lexer/c90.c?rev=173
>> 622&r1=173621&r2=173622&view=diff 
>> ======================================================================
>> ========
>> --- cfe/trunk/test/Lexer/c90.c (original)
>> +++ cfe/trunk/test/Lexer/c90.c Sun Jan 27 14:12:04 2013
>> @@ -29,8 +29,8 @@ void test2() {
>> }
>> 
>> void test3() {
>> -  (void)L"\u1234";  // expected-error {{unicode escape sequences are 
>> only valid in C99 or C++}}
>> -  (void)L'\u1234';  // expected-error {{unicode escape sequences are 
>> only valid in C99 or C++}}
>> +  (void)L"\u1234";  // expected-error {{universal character names are 
>> + only valid in C99 or C++}}  (void)L'\u1234';  // expected-error 
>> + {{universal character names are only valid in C99 or C++}}
>> }
>> 
>> #define PREFIX(x) foo ## x
>> @@ -39,3 +39,5 @@ int test4() {
>>  int *p = &PREFIX(0p+1);
>>  return p[-1];
>> }
>> +
>> +#define MY_UCN \u00FC // expected-warning {{universal character names 
>> +are only valid in C99 or C++; treating as '\' followed by 
>> +identifier}}
>> 
>> 
>> _______________________________________________
>> cfe-commits mailing list
>> cfe-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>> ---------------------------------------------------------------------
>> Intel Israel (74) Limited
>> 
>> This e-mail and any attachments may contain confidential material for 
>> the sole use of the intended recipient(s). Any review or distribution 
>> by others is strictly prohibited. If you are not the intended 
>> recipient, please contact the sender and delete all copies.
>> 
> 
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
> 
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>