[PATCH] D98276: [AsmParser][SystemZ][z/OS] Introducing HLASM Parser support to AsmParser - Part 1

Wed May 12 08:00:28 PDT 2021

anirudhp added inline comments.

================
Comment at: llvm/lib/MC/MCParser/AsmParser.cpp:6223
+  assert(!(ShouldParseAsMachineInstruction && ShouldParseAsLabel) &&
+         "Cannot parse first token  as both a label and an instruction");
+
----------------
uweigand wrote:
> Having two variables that are always the complement of one another seems odd, in particular because ShouldParseAsLabel doesn't appear to be used anywhere?
I guess I could probably remove the `assert` itself. The if else block determines that either one of the flags would be set but not both, so I don't see the point in the assert. There's no other path that would would set both of them to true. If a flag is set incorrectly, the parsing logic for it should catch it.

>ShouldParseAsLabel doesn't appear to be used anywhere?

Currently, no. But since these patches are staggered, the next sequence of patches once this lands, will add in support for parsing in labels. But I agree with your overall comment, that one variable should do the trick. Two are unnecessary. I will remove the `ShouldParseAsLabel` flag.

================
Comment at: llvm/lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp:1403
+      Parser.Lex();
+
+      // If there is nothing after the space, then there is nothing to emit
----------------
uweigand wrote:
> This still looks odd to me.  What seems to be going on is that the LexUntilEndOfStatement lexes everything including the initial space.  The explicit Lex then lexes the EndOfStatement marker.
> 
> I believe that would be wrong: the EndOfStatement is already lexed below (last line of this function).  And the comment just immediately following here seems to imply that the "Lex" was intended to lex that initial *space*.  But for that to be true, wouldn't the Lex have to be done *before* the LexUntilEndOfStatement?
>This still looks odd to me. What seems to be going on is that the LexUntilEndOfStatement lexes everything including the initial space. The explicit Lex then lexes the EndOfStatement marker.

Hmm, I don't think this is exactly right. The Lexer's `SkipSpace` attribute is set to false in AsmParser.cpp. Thus, spaces aren't "ignored" as part of regular tokens, and you have an actual AsmToken::Space.

For example, When we have something like this:
"      abc    def ghij \n"
The sequence of AsmTokens seen are: Space, Identifier, Space, Identifier, Space, Identifier, Space, EndOfStatement"

Relevant implementation in AsmLexer.cpp (the LexToken() function). 
```
  case '\t':
    IsAtStartOfStatement = OldIsAtStartOfStatement;
    while (*CurPtr == ' ' || *CurPtr == '\t')
      CurPtr++;
    if (SkipSpace)
      return LexToken(); // Ignore whitespace.
    else
      return AsmToken(AsmToken::Space, StringRef(TokStart, CurPtr - TokStart));
```

The `CurPtr` field always jumps over the space, but if `SkipSpace` is set to false, we don't lex the "next" token, we just return an `AsmToken::Space` where the onus is on the user to do with it what they want. If its set to true, we just recursively call LexToken() again to return the next possible non-space AsmToken.

If you take a look at the `LexUntilEndOfStatement` function:
```
StringRef AsmLexer::LexUntilEndOfStatement() {
  TokStart = CurPtr;

  while (!isAtStartOfComment(CurPtr) &&     // Start of line comment.
         !isAtStatementSeparator(CurPtr) && // End of statement marker.
         *CurPtr != '\n' && *CurPtr != '\r' && CurPtr != CurBuf.end()) {
    ++CurPtr;
  }
  return StringRef(TokStart, CurPtr-TokStart);
}
```

`CurPtr` already points to the first non-space character, and like you mentioned it goes on until the "end of statement" is reached. After that, a StringRef is returned.

So, when you come into the relevant block in the code:
```
    if (isParsingHLASM() && getTok().is(AsmToken::Space)) {
      // We've confirmed that there is a Remark field.
      StringRef Remark(getLexer().LexUntilEndOfStatement());
      Parser.Lex();
....
....
....
```

We've confirmed that there is a leading space(s), and we want to basically get everything after a space until the end of the statement, to emit as a comment. Once we've done that however, the current AsmToken is still an `AsmToken::Space`, so when we call Lex, we're still "lexing" the `AsmToken::Space`, and returning the next token to be lexed which is now a "\n", because we've effectively moved the `CurPtr` to the end of statement (in the previous call). The point to note here is that the `LexUntilEndOfStatement` function doesn't return an `AsmToken`, it simply returns a `StringRef` from `CurPtr` until it encounters the first `\n` (or the carriage return or the end of the string), while advancing `CurPtr`. So when we exit the block, we still have a `AsmToken::EndOfStatement` to be checked.

The other way of looking at it is possibly the following (this should be the same behaviour, but I haven't tried it yet, so some pseudo code).
```
    if (isParsingHLASM() && getTok().is(AsmToken::Space)) {
         // Lex the leading spaces first

         // In a loop, lex all tokens until you reach the end of statement token
         // aggregating all the "returned" values into a string.
         // i.e. while (getTok().isNot(AsmToken::EndOfStatement) {
         //     RemarkString += getTok().getString()
         //     Parser.Lex()
         // }

         // Emit aggregated string as a comment 
    }
```

Apologies for the very long comment. Please let me know if I've missed something / misunderstood something.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98276/new/

https://reviews.llvm.org/D98276