[PATCH] D99277: [AsmParser][SystemZ][z/OS] Add in support to accept "#" as part of an Identifier token

Tue Mar 30 11:01:21 PDT 2021

anirudhp added inline comments.

================
Comment at: llvm/include/llvm/MC/MCParser/MCAsmLexer.h:151

+  void setAllowHashInIdentifier(bool V) { AllowHashInIdentifier = V; }
+
----------------
nickdesaulniers wrote:
> The only caller is a test?
> 
> I think if you add an `llvm/test/MC/SystemZ` style test, you'd find this code probably isn't working as expected.
> 
> It's curious to me why this differs from how `AllowAtInIdentifier` is set in the constructor for `AsmLexer`.
>The only caller is a test?

For now, yes. There is another patch in waiting (see https://reviews.llvm.org/D98276). Ideally, the call to the setter would be set in the constructor for the `HLASMAsmParser` class)

>I think if you add an llvm/test/MC/SystemZ style test, you'd find this code probably isn't working as expected.

A bit of background: 
I probably should have mentioned this in the description, but these changes would be used by the z/OS target for which we cannot currently generate code yet (work for this is ongoing). Hence, the non-existence of a lit test (which would make things very easy for these types of changes). 

Since the LLVM community prefers to have a test for any functional code changes, I came up with a unit test in the interim, since this would help "alleviate" some of the wait time (i.e. while we wait for the corresponding ObjectFile/Streamer support for the z/OS target and other patches that might be in review)

To answer your question:
Yes, a test in llvm/test/MC/SystemZ would not work as "intended" because the call to the setter isn't made anywhere. To be more specific, for the s390x-ibm-linux triple, the behaviour would be "unexpected". For an s390x-none-zos triple, the test wouldn't even run because we don't have relevant ObjectFile/Streamer support yet (as mentioned above)

>It's curious to me why this differs from how AllowAtInIdentifier is set in the constructor for AsmLexer.

It doesn't differ in terms of the behaviour that it is exposing. I don't really see an issue with moving it to a constructor and providing with similar behaviour (i.e. is MAI's CommentString isn't # set it to true), but since a default value of `false` was provided at the time of declaration, I didn't see the need for it.

If it is preferred to move to the constructor, I don't mind doing so. 

================
Comment at: llvm/lib/MC/MCParser/AsmLexer.cpp:733
+          isIdentifierChar(CurChar, MAI.doesAllowAtInName(),
+                           AllowHashInIdentifier))
         return LexIdentifier();
----------------
nickdesaulniers wrote:
> do we need a similar method to `MCAsmInfo::doesAllowAtInName`?  I wonder why `MCAsmInfo` and `MCAsmLexer` have their own fields for that? Seems like perhaps a (pre-existing) mistake.
I do think you're right, I don't see a reason why for the MS-case, especially in the lexing phase, that we should be using the `MCAsmInfo::doesAllowAtInName` field. I did spend some time thinking about this. To me, it does look like a "pre-existing mistake" in this particular case, but I will admit I am not too familiar with the MS asm semantics. 

However, in terms of the behaviour of the two fields: 

>From the comment in `MCAsmInfo.h`, it was to "allow @ in a symbol name" and this is used in AsmParser.cpp. You could have a symbol name such as "abc at variant", where "variant" could possibly be a valid symbol variant (`MCSymbolRefExpr::VariantKind`). If this attribute is set to true, its assumed there is no special symbol variant for it to lookup.

`AllowAtInName` in MCAsmLexer.h seems exclusively for lexing purposes. Where you want to include "accepting" @ when you see it as part of an `AsmToken::Identifier` 

================
Comment at: llvm/unittests/MC/SystemZ/SystemZAsmLexerTest.cpp:98
-    // Lex initially to get the string.
-    Parser->getLexer().Lex();
   }
----------------
nickdesaulniers wrote:
> it's not clear to me how any of the changes to llvm/lib/ affect these changes to llvm/unittests/ in regards to explicitly calling `Lex`.
My apologies. Could you expand a bit on your comment?

Afaik, we need the initial call to Lex to start the Lexer (to start getting the tokens one by one for later processing and checking). I could technically move it into the `lexAndCheckTokens` function before I run the loop, if that's what you mean?

But the way it was initially setup, I wanted to ensure the flag was set before the Lexer was run, and not during the checking phase, since at that point, the tokens are already Lexed, and we're merely checking, and I couldn't do that if it was encapsulated into a function that was run prior to the call to the setter.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99277/new/

https://reviews.llvm.org/D99277