[llvm-bugs] [Bug 39586] New: Unicode no-break space is treated in an inconsistent way

via llvm-bugs llvm-bugs at lists.llvm.org
Thu Nov 8 01:04:18 PST 2018


https://bugs.llvm.org/show_bug.cgi?id=39586

            Bug ID: 39586
           Summary: Unicode no-break space is treated in an inconsistent
                    way
           Product: clang
           Version: unspecified
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P
         Component: Frontend
          Assignee: unassignedclangbugs at nondot.org
          Reporter: vincent-llvm at vinc17.net
                CC: llvm-bugs at lists.llvm.org, richard-llvm at metafoo.co.uk

As a followup to bug 39585 (which actually is a Debian packaging bug), consider
the following program:

 int a;

#if FOO
#endif

int main (void)
{
  return 0;
}

where the space before "int a;" and the space between "#if" and "FOO" are
no-break spaces (U+00A0).

Under Debian/unstable:

$ clang-8 tst.c
tst.c:1:1: warning: treating Unicode character as whitespace
      [-Wunicode-whitespace]
 int a;
^
tst.c:3:4: warning: treating Unicode character as whitespace
      [-Wunicode-whitespace]
#if FOO
   ^
2 warnings generated.

But with the -E option:

$ clang-8 -E tst.c
tst.c:3:4: error: invalid token at start of a preprocessor expression
#if FOO
   ^
# 1 "tst.c"
# 1 "<built-in>" 1
# 1 "<built-in>" 3
# 349 "<built-in>" 3
# 1 "<command line>" 1
# 1 "<built-in>" 2
# 1 "tst.c" 2
 int a;




int main (void)
{
  return 0;
}
1 error generated.

The first no-break space is probably treated as whitepace, like without the -E
option, but not the second one. This is not consistent.

Previous clang versions behave in the same way.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20181108/d29dc614/attachment.html>


More information about the llvm-bugs mailing list