[cfe-commits] r173622 - PR15067: Don't assert when a UCN appears in a C90 file.

Jordan Rose jordan_rose at apple.com
Fri Feb 8 15:47:44 PST 2013


This went in in r174765-9 in Clang. Are there LLVM-side failures too, or is this good enough?

Jordan


On Feb 5, 2013, at 9:26 , Jordan Rose <jordan_rose at apple.com> wrote:

> [+Joerg, +Dmitri, -Takumi]
> 
> We actually had a discussion about this on IRC. The place where this information currently lives is Clang's Lexer.cpp, which is not at all ideal for reuse. However, some systems actually link Clang against an LLVM dylib (Debian is the only one that comes to mind), and Lexer's use of the CharInfo table is very performance-critical. We decided that for that reason it wouldn't be safe to move the table out of Clang.
> 
> Since you hadn't responded yet, I began pulling it out to a Clang support file in Basic: http://llvm-reviews.chandlerc.com/D363. I don't think this will have disastrous performance implications, and it makes those available to the rest of Clang...but not LLVM. I'm not sure what we're going to do about that, but as long as we keep Unicode out of IR for now we'll be fine...right?
> 
> FWIW, isdigit and isxdigit are safe to use because they are guaranteed to be locale-independent.
> 
> Jordan
> 
> 
> On Feb 5, 2013, at 3:46 , "Benyei, Guy" <guy.benyei at intel.com> wrote:
> 
>> I can take care of this change.
>> Do we already have LLVM ASCII-only versions for the character classification functions somewhere, or should I create them? I tried to look for these functions in LLVM support, but couldn't find them.
>> 
>> Thanks
>>  Guy
>> 
>> -----Original Message-----
>> From: Jordan Rose [mailto:jordan_rose at apple.com] 
>> Sent: Wednesday, January 30, 2013 18:36
>> To: Benyei, Guy
>> Cc: NAKAMURA Takumi; cfe-commits at cs.uiuc.edu
>> Subject: Re: [cfe-commits] r173622 - PR15067: Don't assert when a UCN appears in a C90 file.
>> 
>> Ah...good point. Richard pointed out to me that we probably shouldn't be using the library functions anyway, since sometimes they're locale-sensitive and sometimes they aren't. We can replace them all with LLVM ASCII-only versions that take a char instead of int.
>> 
>> Do you want to seek out all the uses or should I?
>> 
>> 
>> On Jan 29, 2013, at 0:27 , "Benyei, Guy" <guy.benyei at intel.com> wrote:
>> 
>>> Right, but you did change the failing tests, and added Unicode characters (FixIt/fixit-unicode.c, CodeGen\ucn-identifiers.c). 
>>> Anyhow, the failures occur in debug build only:
>>> 
>>> CRT assert: f:\dd\vctools\crt_bld\self_x86\crt\src\isctype.c(56) : 
>>> Assertion failed: (unsigned)(c + 1) <= 256
>>> 
>>> Apparently this assertion is supposed to check that the value passed to the character classification functions is an actual character. However, since in clang and LLVM there are several places where a signed char is passed to these function - the negative char values are simply passed as negative int values, and the c-style cast above results in some really big numbers, that fail the assertion.
>>> 
>>> Thanks
>>> Guy
>>> 
>>> 
>>> -----Original Message-----
>>> From: Jordan Rose [mailto:jordan_rose at apple.com]
>>> Sent: Monday, January 28, 2013 19:03
>>> To: Benyei, Guy; NAKAMURA Takumi
>>> Cc: cfe-commits at cs.uiuc.edu
>>> Subject: Re: [cfe-commits] r173622 - PR15067: Don't assert when a UCN appears in a C90 file.
>>> 
>>> Hm, I didn't change any of these, and they've been working fine on Takumi's bot for a while now. Takumi, do you have an opinion on these fixes?
>>> 
>>> Jordan
>>> 
>>> 
>>> On Jan 28, 2013, at 5:45 , "Benyei, Guy" <guy.benyei at intel.com> wrote:
>>> 
>>>> Hi Jordan,
>>>> Working on Windows/Visual Studio 2010, I get asserts due to the usage of signed chars representing Unicode characters in some library functions expecting int arguments (specifically: isdigit and isalnum). Ultimately, this makes Visual Studio crash while running check-clang.
>>>> A possible solution would be to cast these char values to unsigned char:
>>>> 
>>>> Index: LiteralSupport.cpp
>>>> ===================================================================
>>>> --- LiteralSupport.cpp	(revision 173676)
>>>> +++ LiteralSupport.cpp	(working copy)
>>>> @@ -458,7 +458,7 @@
>>>> // and FP constants (specifically, the 'pp-number' regex), and 
>>>> assumes that  // the byte at "*end" is both valid and not part of the 
>>>> regex.  Because of  // this, it doesn't have to check for 'overscan' in various places.
>>>> -  assert(!isalnum(*ThisTokEnd) && *ThisTokEnd != '.' && *ThisTokEnd 
>>>> != '_' &&
>>>> +  assert(!isalnum((unsigned char)*ThisTokEnd) && *ThisTokEnd != '.' 
>>>> + && *ThisTokEnd != '_' &&
>>>>       "Lexer didn't maximally munch?");
>>>> 
>>>> s = DigitsBegin = ThisTokBegin;
>>>> 
>>>> 
>>>> Index: AsmWriter.cpp
>>>> ===================================================================
>>>> --- AsmWriter.cpp	(revision 173676)
>>>> +++ AsmWriter.cpp	(working copy)
>>>> @@ -117,7 +117,7 @@
>>>> }
>>>> 
>>>> // Scan the name to see if it needs quotes first.
>>>> -  bool NeedsQuotes = isdigit(Name[0]);
>>>> +  bool NeedsQuotes = isdigit((unsigned char)Name[0]);
>>>> if (!NeedsQuotes) {
>>>>  for (unsigned i = 0, e = Name.size(); i != e; ++i) {
>>>>    // By making this unsigned, the value passed in to isalnum will 
>>>> always be
>>>> 
>>>> Anyhow, these are only the specific places the current LITs fail on - I'm really not sure if there are others.
>>>> 
>>>> Thanks
>>>> Guy
>>>> 
>>>> -----Original Message-----
>>>> From: cfe-commits-bounces at cs.uiuc.edu 
>>>> [mailto:cfe-commits-bounces at cs.uiuc.edu] On Behalf Of Jordan Rose
>>>> Sent: Sunday, January 27, 2013 22:12
>>>> To: cfe-commits at cs.uiuc.edu
>>>> Subject: [cfe-commits] r173622 - PR15067: Don't assert when a UCN appears in a C90 file.
>>>> 
>>>> Author: jrose
>>>> Date: Sun Jan 27 14:12:04 2013
>>>> New Revision: 173622
>>>> 
>>>> URL: http://llvm.org/viewvc/llvm-project?rev=173622&view=rev
>>>> Log:
>>>> PR15067: Don't assert when a UCN appears in a C90 file.
>>>> 
>>>> Unfortunately, we can't accept the UCN as an extension because we're required to treat it as two tokens for preprocessing purposes.
>>>> 
>>>> Modified:
>>>> cfe/trunk/include/clang/Basic/DiagnosticLexKinds.td
>>>> cfe/trunk/lib/Lex/Lexer.cpp
>>>> cfe/trunk/lib/Lex/LiteralSupport.cpp
>>>> cfe/trunk/test/Lexer/c90.c
>>>> 
>>>> Modified: cfe/trunk/include/clang/Basic/DiagnosticLexKinds.td
>>>> URL: 
>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Basic/Dia
>>>> g nosticLexKinds.td?rev=173622&r1=173621&r2=173622&view=diff
>>>> =====================================================================
>>>> =
>>>> ========
>>>> --- cfe/trunk/include/clang/Basic/DiagnosticLexKinds.td (original)
>>>> +++ cfe/trunk/include/clang/Basic/DiagnosticLexKinds.td Sun Jan 27
>>>> +++ 14:12:04 2013
>>>> @@ -127,6 +127,11 @@ def warn_cxx98_compat_literal_ucn_escape
>>>> def warn_cxx98_compat_literal_ucn_control_character : Warning<  
>>>> "universal character name referring to a control character "
>>>> "is incompatible with C++98">, InGroup<CXX98Compat>, DefaultIgnore;
>>>> +def warn_ucn_not_valid_in_c89 : Warning<
>>>> +  "universal character names are only valid in C99 or C++; "
>>>> +  "treating as '\\' followed by identifier">, InGroup<Unicode>; def 
>>>> +warn_ucn_not_valid_in_c89_literal : ExtWarn<
>>>> +  "universal character names are only valid in C99 or C++">, 
>>>> +InGroup<Unicode>;
>>>> 
>>>> 
>>>> // Literal
>>>> @@ -165,8 +170,6 @@ def ext_string_too_long : Extension<"str  
>>>> "support">, InGroup<OverlengthStrings>;  def err_character_too_large 
>>>> : Error<  "character too large for enclosing character literal 
>>>> type">; -def
>>>> warn_ucn_not_valid_in_c89 : ExtWarn<
>>>> -  "unicode escape sequences are only valid in C99 or C++">, 
>>>> InGroup<Unicode>;  def warn_cxx98_compat_unicode_literal : Warning<  
>>>> "unicode literals are incompatible with C++98">,  
>>>> InGroup<CXX98Compat>, DefaultIgnore;
>>>> 
>>>> Modified: cfe/trunk/lib/Lex/Lexer.cpp
>>>> URL: 
>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Lex/Lexer.cpp?rev=1
>>>> 7
>>>> 3622&r1=173621&r2=173622&view=diff
>>>> =====================================================================
>>>> =
>>>> ========
>>>> --- cfe/trunk/lib/Lex/Lexer.cpp (original)
>>>> +++ cfe/trunk/lib/Lex/Lexer.cpp Sun Jan 27 14:12:04 2013
>>>> @@ -2698,8 +2698,6 @@ bool Lexer::isCodeCompletionPoint(const
>>>> 
>>>> uint32_t Lexer::tryReadUCN(const char *&StartPtr, const char *SlashLoc,
>>>>                         Token *Result) {
>>>> -  assert(LangOpts.CPlusPlus || LangOpts.C99);
>>>> -
>>>> unsigned CharSize;
>>>> char Kind = getCharAndSize(StartPtr, CharSize);
>>>> 
>>>> @@ -2711,6 +2709,11 @@ uint32_t Lexer::tryReadUCN(const char *&  else
>>>>  return 0;
>>>> 
>>>> +  if (!LangOpts.CPlusPlus && !LangOpts.C99) {
>>>> +    Diag(SlashLoc, diag::warn_ucn_not_valid_in_c89);
>>>> +    return 0;
>>>> +  }
>>>> +
>>>> const char *CurPtr = StartPtr + CharSize;  const char *KindLoc = 
>>>> &CurPtr[-1];
>>>> 
>>>> @@ -2737,7 +2740,7 @@ uint32_t Lexer::tryReadUCN(const char *&
>>>>        }
>>>>      }
>>>>    }
>>>> -      
>>>> +
>>>>    return 0;
>>>>  }
>>>> 
>>>> 
>>>> Modified: cfe/trunk/lib/Lex/LiteralSupport.cpp
>>>> URL: 
>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Lex/LiteralSupport.
>>>> c pp?rev=173622&r1=173621&r2=173622&view=diff
>>>> =====================================================================
>>>> =
>>>> ========
>>>> --- cfe/trunk/lib/Lex/LiteralSupport.cpp (original)
>>>> +++ cfe/trunk/lib/Lex/LiteralSupport.cpp Sun Jan 27 14:12:04 2013
>>>> @@ -277,7 +277,7 @@ static bool ProcessUCNEscape(const char
>>>> 
>>>> if (!Features.CPlusPlus && !Features.C99 && Diags)
>>>>  Diag(Diags, Features, Loc, ThisTokBegin, UcnBegin, ThisTokBuf,
>>>> -         diag::warn_ucn_not_valid_in_c89);
>>>> +         diag::warn_ucn_not_valid_in_c89_literal);
>>>> 
>>>> return true;
>>>> }
>>>> 
>>>> Modified: cfe/trunk/test/Lexer/c90.c
>>>> URL: 
>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/test/Lexer/c90.c?rev=17
>>>> 3
>>>> 622&r1=173621&r2=173622&view=diff
>>>> =====================================================================
>>>> =
>>>> ========
>>>> --- cfe/trunk/test/Lexer/c90.c (original)
>>>> +++ cfe/trunk/test/Lexer/c90.c Sun Jan 27 14:12:04 2013
>>>> @@ -29,8 +29,8 @@ void test2() {
>>>> }
>>>> 
>>>> void test3() {
>>>> -  (void)L"\u1234";  // expected-error {{unicode escape sequences are 
>>>> only valid in C99 or C++}}
>>>> -  (void)L'\u1234';  // expected-error {{unicode escape sequences are 
>>>> only valid in C99 or C++}}
>>>> +  (void)L"\u1234";  // expected-error {{universal character names 
>>>> + are only valid in C99 or C++}}  (void)L'\u1234';  // expected-error 
>>>> + {{universal character names are only valid in C99 or C++}}
>>>> }
>>>> 
>>>> #define PREFIX(x) foo ## x
>>>> @@ -39,3 +39,5 @@ int test4() {
>>>> int *p = &PREFIX(0p+1);
>>>> return p[-1];
>>>> }
>>>> +
>>>> +#define MY_UCN \u00FC // expected-warning {{universal character 
>>>> +names are only valid in C99 or C++; treating as '\' followed by 
>>>> +identifier}}
>>>> 
>>>> 
>>>> _______________________________________________
>>>> cfe-commits mailing list
>>>> cfe-commits at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>>>> ---------------------------------------------------------------------
>>>> Intel Israel (74) Limited
>>>> 
>>>> This e-mail and any attachments may contain confidential material for 
>>>> the sole use of the intended recipient(s). Any review or distribution 
>>>> by others is strictly prohibited. If you are not the intended 
>>>> recipient, please contact the sender and delete all copies.
>>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> Intel Israel (74) Limited
>>> 
>>> This e-mail and any attachments may contain confidential material for 
>>> the sole use of the intended recipient(s). Any review or distribution 
>>> by others is strictly prohibited. If you are not the intended 
>>> recipient, please contact the sender and delete all copies.
>>> 
>> 
>> ---------------------------------------------------------------------
>> Intel Israel (74) Limited
>> 
>> This e-mail and any attachments may contain confidential material for
>> the sole use of the intended recipient(s). Any review or distribution
>> by others is strictly prohibited. If you are not the intended
>> recipient, please contact the sender and delete all copies.
>> 
> 




More information about the cfe-commits mailing list