[PATCH] D79810: [FileCheck] Fix isalpha/isalnum calls

Joel E. Denny via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue May 12 15:05:08 PDT 2020


jdenny created this revision.
jdenny added reviewers: jhenderson, thopre, probinson.
Herald added subscribers: atanasyan, hiraditya, arichardson, sdardis.
Herald added a project: LLVM.

D79276 <https://reviews.llvm.org/D79276> caused the following builder to fail:

  http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/23489

Specifically, FileCheck dumped stack in the following tests:

  LLVM :: MC/Mips/micromips-jump-pc-region.s
  LLVM :: MC/Mips/mips-jump-pc-region.s

Those tests contain characters encoded as 160 but that render (at 
least for me in vim) like a single space (32).  Those characters
appear between the `#` and `RUN:` on several lines, and D79276 <https://reviews.llvm.org/D79276> causes
FileCheck to process those lines differently: `RUN:` is a comment
directive.  As a result, D79276 <https://reviews.llvm.org/D79276> causes FileCheck to start calling is
`isalnum` on those characters.

The problem is that FileCheck calls `isalnum` on type `char` without
casting to `unsigned char` first, so it sign-extends 160 beyond what
`unsigned char` or `EOF` can represent.  C says that has undefined
behavior.  This problem is general to FileCheck's prefix parsing and 
so exists independently of D79276 <https://reviews.llvm.org/D79276>.  This patch changes all such calls
to explicitly cast to `unsigned char`, and it adds tests for cases
that are representative with or without D79276 <https://reviews.llvm.org/D79276>.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D79810

Files:
  llvm/lib/Support/FileCheck.cpp
  llvm/test/FileCheck/bad-char.txt


Index: llvm/test/FileCheck/bad-char.txt
===================================================================
--- /dev/null
+++ llvm/test/FileCheck/bad-char.txt
@@ -0,0 +1,36 @@
+# This file contains characters that render as spaces (at least for me in vim)
+# but are encoded as 160.  Each is indicated with a "^" on the following line.
+# FileCheck used to call functions like isalnum on each without casting to
+# unsigned char first, so it sign-extended beyond what unsigned char or EOF can
+# represent.  C says that has undefined behavior, and it has caused stack dumps
+# under Windows.
+
+ BEFORE-PREFIX:
+^
+AFTER-PREFIX :
+            ^
+BEFORE-VAR: [[ VAR:]]
+              ^
+AFTER-VAR: [[VAR :]]
+                ^
+
+RUN: %ProtectFileCheckOutput \
+RUN: not FileCheck -check-prefix=BEFORE-PREFIX %s < /dev/null 2>&1 | \
+RUN:   FileCheck -check-prefix=ERR-EMPTY-CHECK %s
+
+RUN: %ProtectFileCheckOutput \
+RUN: not FileCheck -check-prefix=AFTER-PREFIX %s < /dev/null 2>&1 | \
+RUN:   FileCheck -check-prefix=ERR-NO-CHECK %s
+
+RUN: %ProtectFileCheckOutput \
+RUN: not FileCheck -check-prefix=BEFORE-VAR %s < /dev/null 2>&1 | \
+RUN:   FileCheck -check-prefix=ERR-BAD-VAR %s
+
+RUN: %ProtectFileCheckOutput \
+RUN: not FileCheck -check-prefix=AFTER-VAR %s < /dev/null 2>&1 | \
+RUN:   FileCheck -check-prefix=ERR-BAD-STRING-VAR %s
+
+ERR-EMPTY-CHECK: error: found empty check string
+ERR-NO-CHECK: error: no check strings found
+ERR-BAD-VAR: error: invalid variable name
+ERR-BAD-STRING-VAR: error: invalid name in string variable definition
Index: llvm/lib/Support/FileCheck.cpp
===================================================================
--- llvm/lib/Support/FileCheck.cpp
+++ llvm/lib/Support/FileCheck.cpp
@@ -151,7 +151,9 @@
   return Regex::escape(*VarVal);
 }
 
-bool Pattern::isValidVarNameStart(char C) { return C == '_' || isalpha(C); }
+bool Pattern::isValidVarNameStart(char C) {
+  return C == '_' || isalpha((unsigned char)C);
+}
 
 Expected<Pattern::VariableProperties>
 Pattern::parseVariable(StringRef &Str, const SourceMgr &SM) {
@@ -171,7 +173,7 @@
       return ErrorDiagnostic::get(SM, Str, "invalid variable name");
 
     // Variable names are composed of alphanumeric characters and underscores.
-    if (Str[I] != '_' && !isalnum(Str[I]))
+    if (Str[I] != '_' && !isalnum((unsigned char)Str[I]))
       break;
     ParsedOneChar = true;
   }
@@ -1079,7 +1081,7 @@
 }
 
 static bool IsPartOfWord(char c) {
-  return (isalnum(c) || c == '-' || c == '_');
+  return (isalnum((unsigned char)c) || c == '-' || c == '_');
 }
 
 Check::FileCheckType &Check::FileCheckType::setCount(int C) {


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D79810.263522.patch
Type: text/x-patch
Size: 2634 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20200512/3160e972/attachment-0001.bin>


More information about the llvm-commits mailing list