[PATCH] D13592: [MC] Fixed parsing of macro arguments where expressions with spaces are present.

Tue Dec 8 03:07:39 PST 2015

dsanders requested changes to this revision.
dsanders added a comment.
This revision now requires changes to proceed.

> Fixed an issue for mips with an instruction such as 'sdc1 $f1, 272 +8(a0)' which

>  has a space between '272' and '+'. The parser would then parse '272' and '+8' as

>  two arguments instead of a single expression resulting in one too many arguments

>  in the pseudo instruction.


This isn't quite right since 'sdc1 $f1 272 + 8(a0)' would parse successfully by itself. This change is about parsing the arguments to a user-defined macro and dividing the input tokens into tokens for each of the three arguments of the macro (insn, reg, and src). At the moment, we find three argument separators (because we don't handle whitespace separated arguments correctly) leaving us with four arguments to the EX user-defined macro. This macro only accepts three arguments so we emit an error.


================
Comment at: lib/MC/MCParser/AsmParser.cpp:2110
@@ -2103,3 +2109,3 @@
           const char *NextChar = getTok().getEndLoc().getPointer();
-          if (*NextChar == ' ')
+          if (*NextChar == ' ' || isdigit(*NextChar))
             AddTokens = 2;
----------------
I think these two arguments do the wrong thing here:
  4 +sym
  4 +(1)
*NextChar will be neither a space or a digit so the loop will terminate at the 'AddTokens == 0 && SpaceEaten' check having only consumed '4 +'.

I think that dropping the AddTokens variable in favour of adding the tokens and continuing should fix this:
  if (isOperator(Lexer.getKind())) {
    MA.push_back(getTok());
    Lex();

    // Whitespace after an operator can be ignored.
    if (Lexer.is(AsmToken::Space))
      Lex();

    continue;
  }

================
Comment at: test/MC/Mips/macro-sdc1.s:1
@@ +1,2 @@
+# RUN: llvm-mc %s -triple=mipsel-unknown-linux -show-encoding -mcpu=mips32r2 | \
+# RUN:   FileCheck %s
----------------
(about filename): By our naming conventions, this would be a test for a macro called 'sdc1' but the test is really about argument parsing and the times it's correct to treat whitespace as an argument separator. Something like user-macro-argument-separation.s would be clearer.

================
Comment at: test/MC/Mips/macro-sdc1.s:9
@@ +8,3 @@
+  .macro  EX insn, reg, src
+.ex\@: \insn \reg, \src
+  .endm
----------------
Indentation. Likewise for the comments below

================
Comment at: test/MC/Mips/macro-sdc1.s:12-22
@@ +11,12 @@
+
+  EX sdc1 $f0, 272+8($a0)
+# CHECK: sdc1    $f0, 280($4)
+
+  EX sdc1 $f0, 272 +8($a0)
+# CHECK: sdc1    $f0, 280($4)
+
+  EX sdc1 $f0, 272+ 8($a0)
+# CHECK: sdc1    $f0, 280($4)
+
+  EX sdc1 $f0, 272 + 8($a0)
+# CHECK: sdc1    $f0, 280($4)
----------------
I think we need a few extra cases here. At the moment we don't test prefix operators, symbols, expressions in parentheses, and combinations of them. I've tried a few of these on GAS and there were a couple surprising behaviours.

Here's the test case I ran through GAS:
  .extern sym
  # imm and rs are deliberately swapped to test whitespace separated arguments.
  .macro EX2 insn, rd, imm, rs
  .ex\@: \insn \rd, \rs, \imm
  .endm

  EX2 addiu $2, 1 $3
  EX2 addiu $2, ~1 $3
  EX2 addiu $2, ~ 1 $3
  EX2 addiu $2, 1+1 $3
  EX2 addiu $2, 1+ 1 $3
  EX2 addiu $2, 1 +1 $3
  EX2 addiu $2, 1 + 1 $3
  EX2 addiu $2, 1+~1 $3
  EX2 addiu $2, 1+~ 1 $3
  EX2 addiu $2, 1+ ~1 $3
  EX2 addiu $2, 1 +~1 $3
  EX2 addiu $2, 1 +~ 1 $3
  EX2 addiu $2, 1 + ~1 $3
  EX2 addiu $2, 1 + ~ 1 $3
  # Each of the next four produce variations of '1+(1)$3' as a single argument.
  EX2 addiu $2, 1+(1) $3
  EX2 addiu $2, 1 +(1) $3
  EX2 addiu $2, 1+ (1) $3
  EX2 addiu $2, 1 + (1) $3
  EX2 addiu $2, 1+(1)+1 $3
  EX2 addiu $2, 1 +(1)+1 $3
  EX2 addiu $2, 1+ (1)+1 $3
  EX2 addiu $2, 1 + (1)+1 $3
  nop
  EX2 addiu $2, sym $3
  EX2 addiu $2, -sym $3
  EX2 addiu $2, - sym $3
  EX2 addiu $2, 1+sym $3
  EX2 addiu $2, 1+ sym $3
  EX2 addiu $2, 1 +sym $3
  EX2 addiu $2, 1 + sym $3
  EX2 addiu $2, 1+~sym $3
  EX2 addiu $2, 1+~ sym $3
  EX2 addiu $2, 1+ ~sym $3
  EX2 addiu $2, 1 +~sym $3
  EX2 addiu $2, 1 +~ sym $3
  EX2 addiu $2, 1 + ~sym $3
  EX2 addiu $2, 1 + ~ sym $3
  # Each of the next four produce variations of '1+(sym)$3' as a single argument.
  EX2 addiu $2, 1+(sym) $3
  EX2 addiu $2, 1 +(sym) $3
  EX2 addiu $2, 1+ (sym) $3
  EX2 addiu $2, 1 + (sym) $3
  EX2 addiu $2, 1+(1)+sym $3
  EX2 addiu $2, 1 +(1)+sym $3
  EX2 addiu $2, 1+ (1)+sym $3
  EX2 addiu $2, 1 + (1)+sym $3

Removing the commas also produces some surprising arguments such as '$2~1'.


http://reviews.llvm.org/D13592