[llvm] 3cf7efe - [TableGen] Allow mnemonics with uppercase letters to be matched

Nicolas Guillemot via llvm-commits llvm-commits at lists.llvm.org
Fri Aug 14 14:48:09 PDT 2020


Author: Nicolas Guillemot
Date: 2020-08-14T14:47:52-07:00
New Revision: 3cf7efec986da0e2e8812f83eb7507512475687d

URL: https://github.com/llvm/llvm-project/commit/3cf7efec986da0e2e8812f83eb7507512475687d
DIFF: https://github.com/llvm/llvm-project/commit/3cf7efec986da0e2e8812f83eb7507512475687d.diff

LOG: [TableGen] Allow mnemonics with uppercase letters to be matched

The assembly parser "canonicalizes" the mnemonics it processes at an
early level by making them lowercase. The goal of this is presumably to
allow assembly to be case-insensitive. However, if one declares an
instruction with a mnemonic using uppercase letters, then it will
never get matched, since the generated lookup tables for the
AsmMatcherEmitter didn't lower() their inputs. This made it difficult to
have instructions that get printed using a mnemonic that includes
uppercase letters, since they could not be parsed.

To fix this problem, this patch adds a few calls to lower() to make the
lookup tables used in AsmMatcherEmitter be case-insensitive. This allows
instruction mnemonics with uppercase letters to be parsed.

Differential Revision: https://reviews.llvm.org/D85858

Added: 
    llvm/test/TableGen/MixedCasedMnemonic.td

Modified: 
    llvm/utils/TableGen/AsmMatcherEmitter.cpp

Removed: 
    


################################################################################
diff  --git a/llvm/test/TableGen/MixedCasedMnemonic.td b/llvm/test/TableGen/MixedCasedMnemonic.td
new file mode 100644
index 000000000000..b1604716e2b9
--- /dev/null
+++ b/llvm/test/TableGen/MixedCasedMnemonic.td
@@ -0,0 +1,55 @@
+// RUN: llvm-tblgen -gen-asm-matcher -I %p/../../include %s | FileCheck %s --check-prefix=MATCHER
+// RUN: llvm-tblgen -gen-asm-writer -I %p/../../include %s | FileCheck %s --check-prefix=WRITER
+
+// Check that an instruction that uses mixed upper/lower case in its mnemonic
+// is printed as-is, and is parsed in its "canonicalized" lowercase form.
+
+include "llvm/Target/Target.td"
+
+def ArchInstrInfo : InstrInfo { }
+
+def Arch : Target {
+  let InstructionSet = ArchInstrInfo;
+}
+
+def Reg : Register<"reg">;
+def RegClass : RegisterClass<"foo", [i32], 0, (add Reg)>;
+
+// Define instructions that demonstrate case-insensitivity.
+// In case-sensitive ASCII order, "BInst" < "aInst".
+// In case-insensitive order, "aInst" < "BInst".
+// If the matcher really treats the mnemonics in a case-insensitive way,
+// then we should see "aInst" appearing before "BInst", despite the
+// fact that "BInst" would appear before "aInst" in ASCIIbetical order.
+def AlphabeticallySecondInst : Instruction {
+  let Size = 2;
+  let OutOperandList = (outs);
+  let InOperandList = (ins);
+  let AsmString = "BInst";
+}
+
+def AlphabeticallyFirstInst : Instruction {
+  let Size = 2;
+  let OutOperandList = (outs);
+  let InOperandList = (ins);
+  let AsmString = "aInst";
+}
+
+// Check that the matcher lower()s the mnemonics it matches.
+// MATCHER: static const char *const MnemonicTable =
+// MATCHER-NEXT: "\005ainst\005binst";
+
+// Check that aInst appears before BInst in the match table.
+// This shows that the mnemonics are sorted in a case-insensitive way,
+// since otherwise "B" would be less than "a" by ASCII order.
+// MATCHER:      static const MatchEntry MatchTable0[] = {
+// MATCHER-NEXT:     /* aInst */, ::AlphabeticallyFirstInst
+// MATCHER-NEXT:     /* BInst */, ::AlphabeticallySecondInst
+// MATCHER-NEXT: };
+
+// Check that the writer preserves the case of the mnemonics.
+// WRITER:      static const char AsmStrs[] = {
+// WRITER:        "BInst\0"
+// WRITER-NEXT:   "aInst\0"
+// WRITER-NEXT: };
+

diff  --git a/llvm/utils/TableGen/AsmMatcherEmitter.cpp b/llvm/utils/TableGen/AsmMatcherEmitter.cpp
index 3d63059dcb8b..5b722b20e048 100644
--- a/llvm/utils/TableGen/AsmMatcherEmitter.cpp
+++ b/llvm/utils/TableGen/AsmMatcherEmitter.cpp
@@ -612,7 +612,7 @@ struct MatchableInfo {
   /// operator< - Compare two matchables.
   bool operator<(const MatchableInfo &RHS) const {
     // The primary comparator is the instruction mnemonic.
-    if (int Cmp = Mnemonic.compare(RHS.Mnemonic))
+    if (int Cmp = Mnemonic.compare_lower(RHS.Mnemonic))
       return Cmp == -1;
 
     if (AsmOperands.size() != RHS.AsmOperands.size())
@@ -2880,7 +2880,7 @@ static void emitCustomOperandParsing(raw_ostream &OS, CodeGenTarget &Target,
     OS << "  { ";
 
     // Store a pascal-style length byte in the mnemonic.
-    std::string LenMnemonic = char(II.Mnemonic.size()) + II.Mnemonic.str();
+    std::string LenMnemonic = char(II.Mnemonic.size()) + II.Mnemonic.lower();
     OS << StringTable.GetOrAddStringOffset(LenMnemonic, false)
        << " /* " << II.Mnemonic << " */, ";
 
@@ -3324,7 +3324,7 @@ void AsmMatcherEmitter::run(raw_ostream &OS) {
     HasDeprecation |= MI->HasDeprecation;
 
     // Store a pascal-style length byte in the mnemonic.
-    std::string LenMnemonic = char(MI->Mnemonic.size()) + MI->Mnemonic.str();
+    std::string LenMnemonic = char(MI->Mnemonic.size()) + MI->Mnemonic.lower();
     MaxMnemonicIndex = std::max(MaxMnemonicIndex,
                         StringTable.GetOrAddStringOffset(LenMnemonic, false));
   }
@@ -3438,7 +3438,8 @@ void AsmMatcherEmitter::run(raw_ostream &OS) {
         continue;
 
       // Store a pascal-style length byte in the mnemonic.
-      std::string LenMnemonic = char(MI->Mnemonic.size()) + MI->Mnemonic.str();
+      std::string LenMnemonic =
+          char(MI->Mnemonic.size()) + MI->Mnemonic.lower();
       OS << "  { " << StringTable.GetOrAddStringOffset(LenMnemonic, false)
          << " /* " << MI->Mnemonic << " */, "
          << Target.getInstNamespace() << "::"


        


More information about the llvm-commits mailing list