[llvm] f076da6 - [AsmParser][SystemZ][z/OS] Introducing HLASM Parser support to AsmParser - Part 1

Wed May 19 08:05:35 PDT 2021

Author: Anirudh Prasad
Date: 2021-05-19T11:05:30-04:00
New Revision: f076da66b9d3c721611de36e43ac1da1607d1abf

URL: https://github.com/llvm/llvm-project/commit/f076da66b9d3c721611de36e43ac1da1607d1abf
DIFF: https://github.com/llvm/llvm-project/commit/f076da66b9d3c721611de36e43ac1da1607d1abf.diff

LOG: [AsmParser][SystemZ][z/OS] Introducing HLASM Parser support to AsmParser - Part 1

- This patch (is one in a series of patches) which introduces HLASM Parser support (for the first parameter of inline asm statements) to LLVM ([[ https://lists.llvm.org/pipermail/llvm-dev/2021-January/147686.html | main RFC here ]])
- This patch in particular introduces HLASM Parser support for Z machine instructions.
- The approach taken here was to subclass `AsmParser`, and make various functions and variables as "protected" wherever appropriate.
- The `HLASMAsmParser` class overrides the `parseStatement` function. Two new private functions `parseAsHLASMLabel` and `parseAsMachineInstruction` are introduced as well.

The general syntax is laid out as follows (more information available in [[ https://www.ibm.com/support/knowledgecenter/SSENW6_1.6.0/com.ibm.hlasm.v1r6.asm/asmr1023.pdf | HLASM V1R6 Language Reference Manual ]] - Chapter 2 - Instruction Statement Format):

```
<TokA><spaces.*><TokB><spaces.*><TokC><spaces.*><TokD>
```

1. TokA is referred to as the Name Entry. This token is optional
2. TokB is referred to as the Operation Entry. This token is mandatory.
3. TokC is referred to as the Operand Entry. This token is mandatory
4. TokD is referred to as the Remarks Entry. This token is optional

- If TokA is provided, then we either parse TokA as a possible comment or as a label (Name Entry), Tok B as the Operation Entry and so on.
- If TokA is not provided (i.e. we have one or more spaces and then the first token), then we will parse the first token (i.e TokB) as a possible Z machine instruction, TokC as the operands to the Z machine instruction and TokD as a possible Remark field
- TokC (Operand Entry), no spaces are allowed between OperandEntries. If a space occurs it is classified as an error.
- TokD if provided is taken as is, and emitted as a comment.

The following additional approach was examined, but not taken:

- Adding custom private only functions to base AsmParser class, and only invoking them for z/OS. While this would eliminate the need for another child class, these private functions would be of non-use to every other target. Similarly, adding any pure virtual functions to the base MCAsmParser class and overriding them in AsmParser would also have the same disadvantage.

Testing:

- This patch doesn't have tests added with it, for the sole reason that MCStreamer Support and Object File support hasn't been added for the z/OS target (yet). Hence, it's not possible generate code outright for the z/OS target. They are in the process of being committed / process of being worked on.

- Any comments / feedback on how to combat this "lack of testing" due to other missing required features is appreciated.

Reviewed By: Kai, uweigand

Differential Revision: https://reviews.llvm.org/D98276

Added: 
    

Modified: 
    llvm/include/llvm/MC/MCAsmInfo.h
    llvm/lib/CodeGen/AsmPrinter/AsmPrinterInlineAsm.cpp
    llvm/lib/MC/MCParser/AsmParser.cpp
    llvm/lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp
    llvm/lib/Target/SystemZ/MCTargetDesc/SystemZMCAsmInfo.cpp

Removed: 
    


################################################################################
diff  --git a/llvm/include/llvm/MC/MCAsmInfo.h b/llvm/include/llvm/MC/MCAsmInfo.h
index b7c075cb3451..dcc39f0a77cf 100644

--- a/llvm/include/llvm/MC/MCAsmInfo.h
+++ b/llvm/include/llvm/MC/MCAsmInfo.h
@@ -146,6 +146,10 @@ class MCAsmInfo {
   /// Default is true.
   bool AllowAdditionalComments = true;
 
+  /// Should we emit the '\t' as the starting indentation marker for GNU inline
+  /// asm statements. Defaults to true.
+  bool EmitGNUAsmStartIndentationMarker = true;
+
   /// This is appended to emitted labels.  Defaults to ":"
   const char *LabelSuffix;
 
@@ -624,6 +628,9 @@ class MCAsmInfo {
     return RestrictCommentStringToStartOfStatement;
   }
   bool shouldAllowAdditionalComments() const { return AllowAdditionalComments; }
+  bool getEmitGNUAsmStartIndentationMarker() const {
+    return EmitGNUAsmStartIndentationMarker;
+  }
   const char *getLabelSuffix() const { return LabelSuffix; }
 
   bool useAssignmentForEHBegin() const { return UseAssignmentForEHBegin; }

diff  --git a/llvm/lib/CodeGen/AsmPrinter/AsmPrinterInlineAsm.cpp b/llvm/lib/CodeGen/AsmPrinter/AsmPrinterInlineAsm.cpp
index c494510044bd..b49e7f6c700c 100644
--- a/llvm/lib/CodeGen/AsmPrinter/AsmPrinterInlineAsm.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/AsmPrinterInlineAsm.cpp
@@ -270,14 +270,16 @@ static void EmitMSInlineAsmStr(const char *AsmStr, const MachineInstr *MI,
 }
 
 static void EmitGCCInlineAsmStr(const char *AsmStr, const MachineInstr *MI,
-                                MachineModuleInfo *MMI, int AsmPrinterVariant,
+                                MachineModuleInfo *MMI, const MCAsmInfo *MAI,
                                 AsmPrinter *AP, unsigned LocCookie,
                                 raw_ostream &OS) {
   int CurVariant = -1;            // The number of the {.|.|.} region we are in.
   const char *LastEmitted = AsmStr; // One past the last character emitted.
   unsigned NumOperands = MI->getNumOperands();
+  int AsmPrinterVariant = MAI->getAssemblerDialect();
 
-  OS << '\t';
+  if (MAI->getEmitGNUAsmStartIndentationMarker())
+    OS << '\t';
 
   while (*LastEmitted) {
     switch (*LastEmitted) {
@@ -499,11 +501,9 @@ void AsmPrinter::emitInlineAsm(const MachineInstr *MI) const {
   SmallString<256> StringData;
   raw_svector_ostream OS(StringData);
 
-  // The variant of the current asmprinter.
-  int AsmPrinterVariant = MAI->getAssemblerDialect();
   AsmPrinter *AP = const_cast<AsmPrinter*>(this);
   if (MI->getInlineAsmDialect() == InlineAsm::AD_ATT)
-    EmitGCCInlineAsmStr(AsmStr, MI, MMI, AsmPrinterVariant, AP, LocCookie, OS);
+    EmitGCCInlineAsmStr(AsmStr, MI, MMI, MAI, AP, LocCookie, OS);
   else
     EmitMSInlineAsmStr(AsmStr, MI, MMI, AP, LocCookie, OS);
 

diff  --git a/llvm/lib/MC/MCParser/AsmParser.cpp b/llvm/lib/MC/MCParser/AsmParser.cpp
index 77cf92ce4268..fdde878a194b 100644
--- a/llvm/lib/MC/MCParser/AsmParser.cpp
+++ b/llvm/lib/MC/MCParser/AsmParser.cpp
@@ -186,6 +186,17 @@ class AsmParser : public MCAsmParser {
   // Is alt macro mode enabled.
   bool AltMacroMode = false;
 
+protected:
+  virtual bool parseStatement(ParseStatementInfo &Info,
+                              MCAsmParserSemaCallback *SI);
+
+  /// This routine uses the target specific ParseInstruction function to
+  /// parse an instruction into Operands, and then call the target specific
+  /// MatchAndEmit function to match and emit the instruction.
+  bool parseAndMatchAndEmitTargetInstruction(ParseStatementInfo &Info,
+                                             StringRef IDVal, AsmToken ID,
+                                             SMLoc IDLoc);
+
 public:
   AsmParser(SourceMgr &SM, MCContext &Ctx, MCStreamer &Out,
             const MCAsmInfo &MAI, unsigned CB);
@@ -273,8 +284,6 @@ class AsmParser : public MCAsmParser {
   /// }
 
 private:
-  bool parseStatement(ParseStatementInfo &Info,
-                      MCAsmParserSemaCallback *SI);
   bool parseCurlyBlockScope(SmallVectorImpl<AsmRewrite>& AsmStrRewrites);
   bool parseCppHashLineFilenameComment(SMLoc L, bool SaveLocInfo = true);
 
@@ -701,6 +710,36 @@ class AsmParser : public MCAsmParser {
   void initializeCVDefRangeTypeMap();
 };
 
+class HLASMAsmParser final : public AsmParser {
+private:
+  MCAsmLexer &Lexer;
+  MCStreamer &Out;
+
+  void lexLeadingSpaces() {
+    while (Lexer.is(AsmToken::Space))
+      Lexer.Lex();
+  }
+
+  bool parseAsHLASMLabel(ParseStatementInfo &Info, MCAsmParserSemaCallback *SI);
+  bool parseAsMachineInstruction(ParseStatementInfo &Info,
+                                 MCAsmParserSemaCallback *SI);
+
+public:
+  HLASMAsmParser(SourceMgr &SM, MCContext &Ctx, MCStreamer &Out,
+                 const MCAsmInfo &MAI, unsigned CB = 0)
+      : AsmParser(SM, Ctx, Out, MAI, CB), Lexer(getLexer()), Out(Out) {
+    Lexer.setSkipSpace(false);
+    Lexer.setAllowHashInIdentifier(true);
+    Lexer.setLexHLASMIntegers(true);
+    Lexer.setLexHLASMStrings(true);
+  }
+
+  ~HLASMAsmParser() { Lexer.setSkipSpace(true); }
+
+  bool parseStatement(ParseStatementInfo &Info,
+                      MCAsmParserSemaCallback *SI) override;
+};
+
 } // end anonymous namespace
 
 namespace llvm {
@@ -2257,6 +2296,13 @@ bool AsmParser::parseStatement(ParseStatementInfo &Info,
   if (checkForValidSection())
     return true;
 
+  return parseAndMatchAndEmitTargetInstruction(Info, IDVal, ID, IDLoc);
+}
+
+bool AsmParser::parseAndMatchAndEmitTargetInstruction(ParseStatementInfo &Info,
+                                                      StringRef IDVal,
+                                                      AsmToken ID,
+                                                      SMLoc IDLoc) {
   // Canonicalize the opcode to lower case.
   std::string OpcodeStr = IDVal.lower();
   ParseInstructionInfo IInfo(Info.AsmRewrites);
@@ -6124,6 +6170,76 @@ bool AsmParser::parseMSInlineAsm(
   return false;
 }
 
+bool HLASMAsmParser::parseAsMachineInstruction(ParseStatementInfo &Info,
+                                               MCAsmParserSemaCallback *SI) {
+  AsmToken OperationEntryTok = Lexer.getTok();
+  SMLoc OperationEntryLoc = OperationEntryTok.getLoc();
+  StringRef OperationEntryVal;
+
+  // If we see a new line or carriage return, emit the new line
+  // and lex it.
+  if (OperationEntryTok.is(AsmToken::EndOfStatement)) {
+    if (getTok().getString().front() == '\n' ||
+        getTok().getString().front() == '\r') {
+      Out.AddBlankLine();
+      Lex();
+      return false;
+    }
+  }
+
+  // Attempt to parse the first token as an Identifier
+  if (parseIdentifier(OperationEntryVal))
+    return Error(OperationEntryLoc, "unexpected token at start of statement");
+
+  // Once we've parsed the operation entry successfully, lex
+  // any spaces to get to the OperandEntries.
+  lexLeadingSpaces();
+
+  return parseAndMatchAndEmitTargetInstruction(
+      Info, OperationEntryVal, OperationEntryTok, OperationEntryLoc);
+}
+
+bool HLASMAsmParser::parseStatement(ParseStatementInfo &Info,
+                                    MCAsmParserSemaCallback *SI) {
+  assert(!hasPendingError() && "parseStatement started with pending error");
+
+  // Should the first token be interpreted as a machine instruction.
+  bool ShouldParseAsMachineInstruction = false;
+
+  // If a Name Entry exists, it should occur at the very
+  // start of the string. In this case, we should parse the
+  // first non-space token as a Label.
+  // If the Name entry is missing (i.e. there's some other
+  // token), then we attempt to parse the first non-space
+  // token as a Machine Instruction.
+  if (getTok().is(AsmToken::Space))
+    ShouldParseAsMachineInstruction = true;
+
+  // If we have an EndOfStatement (which includes the target's comment
+  // string) we can appropriately lex it early on)
+  if (Lexer.is(AsmToken::EndOfStatement)) {
+    // if this is a line comment we can drop it safely
+    if (getTok().getString().empty() || getTok().getString().front() == '\r' ||
+        getTok().getString().front() == '\n')
+      Out.AddBlankLine();
+    Lex();
+    return false;
+  }
+
+  // We have established how to parse the inline asm statement.
+  // Now we can safely lex any leading spaces to get to the
+  // first token.
+  lexLeadingSpaces();
+
+  if (ShouldParseAsMachineInstruction)
+    return parseAsMachineInstruction(Info, SI);
+
+  // Label parsing support isn't implemented completely (yet).
+  SMLoc Loc = getTok().getLoc();
+  eatToEndOfStatement();
+  return Error(Loc, "HLASM Label parsing support not yet implemented");
+}
+
 namespace llvm {
 namespace MCParserUtils {
 
@@ -6211,5 +6327,8 @@ bool parseAssignmentExpression(StringRef Name, bool allow_redef,
 MCAsmParser *llvm::createMCAsmParser(SourceMgr &SM, MCContext &C,
                                      MCStreamer &Out, const MCAsmInfo &MAI,
                                      unsigned CB) {
+  if (C.getTargetTriple().isOSzOS())
+    return new HLASMAsmParser(SM, C, Out, MAI, CB);
+
   return new AsmParser(SM, C, Out, MAI, CB);
 }

diff  --git a/llvm/lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp b/llvm/lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp
index 435c68165c38..0de24245cfcc 100644
--- a/llvm/lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp
+++ b/llvm/lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp
@@ -1381,10 +1381,38 @@ bool SystemZAsmParser::ParseInstruction(ParseInstructionInfo &Info,
     // Read any subsequent operands.
     while (getLexer().is(AsmToken::Comma)) {
       Parser.Lex();
+
+      if (isParsingHLASM() && getLexer().is(AsmToken::Space))
+        return Error(
+            Parser.getTok().getLoc(),
+            "No space allowed between comma that separates operand entries");
+
       if (parseOperand(Operands, Name)) {
         return true;
       }
     }
+
+    // Under the HLASM variant, we could have the remark field
+    // The remark field occurs after the operation entries
+    // There is a space that separates the operation entries and the
+    // remark field.
+    if (isParsingHLASM() && getTok().is(AsmToken::Space)) {
+      // We've confirmed that there is a Remark field.
+      StringRef Remark(getLexer().LexUntilEndOfStatement());
+      Parser.Lex();
+
+      // If there is nothing after the space, then there is nothing to emit
+      // We could have a situation as this:
+      // "  \n"
+      // After lexing above, we will have
+      // "\n"
+      // This isn't an explicit remark field, so we don't have to output
+      // this as a comment.
+      if (Remark.size())
+        // Output the entire Remarks Field as a comment
+        getStreamer().AddComment(Remark);
+    }
+
     if (getLexer().isNot(AsmToken::EndOfStatement)) {
       SMLoc Loc = getLexer().getLoc();
       return Error(Loc, "unexpected token in argument list");

diff  --git a/llvm/lib/Target/SystemZ/MCTargetDesc/SystemZMCAsmInfo.cpp b/llvm/lib/Target/SystemZ/MCTargetDesc/SystemZMCAsmInfo.cpp
index e3d71f21f70c..221b6fdd59c8 100644
--- a/llvm/lib/Target/SystemZ/MCTargetDesc/SystemZMCAsmInfo.cpp
+++ b/llvm/lib/Target/SystemZ/MCTargetDesc/SystemZMCAsmInfo.cpp
@@ -29,6 +29,7 @@ SystemZMCAsmInfo::SystemZMCAsmInfo(const Triple &TT) {
   AllowHashAtStartOfIdentifier = (AssemblerDialect == AD_HLASM);
   DotIsPC = (AssemblerDialect == AD_ATT);
   StarIsPC = (AssemblerDialect == AD_HLASM);
+  EmitGNUAsmStartIndentationMarker = (AssemblerDialect == AD_ATT);
 
   ZeroDirective = "\t.space\t";
   Data64bitsDirective = "\t.quad\t";