[PATCH] D22112: Disambiguate a constant with both 0B prefix and H suffix.

Yunzhong Gao via llvm-commits llvm-commits at lists.llvm.org
Mon Aug 15 20:07:39 PDT 2016


ygao added inline comments.

================
Comment at: lib/MC/MCParser/AsmLexer.cpp:261
@@ +260,3 @@
+  // MASM-flavor hexadecimal integer: [0-9][0-9a-fA-F]*[hH]
+  if (IsParsingMSInlineAsm && isdigit(CurPtr[-1])) {
+    const char *FirstNonBinary = (CurPtr[-1] != '0' && CurPtr[-1] != '1') ?
----------------
colinl wrote:
> Will CurPtr ever be at the beginning of the buffer, making index -1 invalid?
I think, that is not possible.

The only call site of LexDigit() is in AsmLexer::LexToken() in the same file.
LexToken() calls getNextChar() to advance CurPtr before calling LexDigit(), so
it is known that CurPtr[-1] will be [0-9].
```
  AsmToken AsmLexer::LexToken() {
    TokStart = CurPtr;
    int CurChar = getNextChar(); // basically "CurChar = *CurPtr++;"
    ...
    switch (CurChar) {
      ...
      case [0-9]:
        return LexDigit(); // inside LexDigt(), CurPtr[-1] is "CurChar" here
      ...
    }
  }
```
The original codes in this function, located several lines below, also checks "if (CurPtr[-1] != '0' ...)".

================
Comment at: tools/clang/test/CodeGenCXX/ms-inline-asm-return.cpp:88
@@ -87,3 +87,3 @@
 // CHECK-LABEL: define i64 @f_s8()
-// CHECK: %[[r:[^ ]*]] = call i64 asm sideeffect inteldialect "mov eax, $$0x01010101\0A\09mov edx, $$0x01010101", "=A,~{eax},{{.*}}"
+// CHECK: %[[r:[^ ]*]] = call i64 asm sideeffect inteldialect "mov eax, $$16843009\0A\09mov edx, $$16843009", "=A,~{eax},{{.*}}"
 // CHECK: store i64 %[[r]], i64* %{{.*}}
----------------
colinl wrote:
> What made these switch from printing hex to decimal?
I was curious about it my own self...

X86AsmParser::ParseIntelOperand() makes this decision by comparing the following
two sizes: (search for a comment "rewrite the complex expression as a single immediate" to
locate the codes)
1. size of the token, which is "next token position - current token position".
   e.g., given "0b0101U", the size would be 7.
2. size of the string passed as the first argument to the constructor of intToken(). It is the "Result"
   string used in several places of the LexDigit() function. e.g., given "0b0101U", the
   "Result" string would be 6; the "U" suffix is not counted.

If these two sizes are equal, the original expression is printed, otherwise the
expression is rewritten as a decimal integer. In this case, "01010101h" will get rewritten with or
without my changes, because the two sizes are 9 vs 8; on the other hand, my
changes disallow "0x" prefix for MS-Intel inline assembly.


https://reviews.llvm.org/D22112





More information about the llvm-commits mailing list