[PATCH] D108602: [RISCV] Initial support .insn directive for the assembler.

Craig Topper via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Sep 2 11:29:32 PDT 2021


craig.topper added a comment.

In D108602#2979398 <https://reviews.llvm.org/D108602#2979398>, @asb wrote:

> I made a start on .insn support quite some time ago (and unfortunately got distracted with other things and didn't get it to the point it was worth sharing - apologies for that).
>
> I actually had an idea for doing it in a slightly different way vs SystemZ (that SystemZ could hopefully be moved towards as well). Essentially, if we parse `.insn` like an instruction and not like a directive, we can reuse most of the existing tablegen machinery for asm parsing.
>
> e.g. adding a new hook to AsmParser:
>
>   --- a/llvm/include/llvm/MC/MCParser/MCTargetAsmParser.h
>   +++ b/llvm/include/llvm/MC/MCParser/MCTargetAsmParser.h
>   @@ -416,6 +416,13 @@ public:
>      /// \param DirectiveID - the identifier token of the directive.
>      virtual bool ParseDirective(AsmToken DirectiveID) = 0;
>    
>   +  /// Return true if the given IDVal should be parsed as a directive. Called
>   +  /// whenever a "normal" directive is encountered (i.e. not conditional
>   +  /// assembly directives). Targets may use this to override default parsing
>   +  /// behaviour, for instance to allow instruction-like directives such as
>   +  /// .insn to be parsed as an instruction.
>   +  virtual bool shouldParseAsDirective(StringRef IDVal) { return true; }
>   +
>      /// MatchAndEmitInstruction - Recognize a series of operands of a parsed
>      /// instruction as an actual MCInst and emit it to the specified MCStreamer.
>      /// This returns false on success and returns true on failure to match.
>   diff --git a/llvm/lib/MC/MCParser/AsmParser.cpp b/llvm/lib/MC/MCParser/AsmParser.cpp
>   index c05f26cbdda5..c3fd85f4a9e0 100644
>   --- a/llvm/lib/MC/MCParser/AsmParser.cpp
>   +++ b/llvm/lib/MC/MCParser/AsmParser.cpp
>   @@ -1889,7 +1889,8 @@ bool AsmParser::parseStatement(ParseStatementInfo &Info,
>      // Otherwise, we have a normal instruction or directive.
>    
>      // Directives start with "."
>   -  if (IDVal.startswith(".") && IDVal != ".") {
>   +  if (IDVal.startswith(".") && IDVal != "." &&
>   +      getTargetParser().shouldParseAsDirective(IDVal)) {
>        // There are several entities interested in parsing directives:
>        //
>        // 1. The target-specific assembly parser. Some directives are target
>
> Then implement that hook in RISCVAsmParser:
>
>   @@ -1756,6 +1808,17 @@ bool RISCVAsmParser::ParseInstruction(ParseInstructionInfo &Info,
>      // First operand is token for instruction
>      Operands.push_back(RISCVOperand::createToken(Name, NameLoc, isRV64()));
>    
>   +  // If parsing a .insn directive, then create a token containing the
>   +  // instruction format identifier.
>   +  if (Name == ".insn") {
>   +    SMLoc Loc = getLoc();
>   +    if (!getLexer().is(AsmToken::Identifier))
>   +      return Error(Loc, ".insn should be followed by an instruction format");
>   +    StringRef InstFormat = getParser().getTok().getIdentifier();
>   +    getParser().Lex();
>   +    Operands.push_back(RISCVOperand::createToken(InstFormat, Loc, isRV64()));
>   +  }
>   +
>      // If there are no more operands, then finish
>      if (getLexer().is(AsmToken::EndOfStatement))
>        return false;
>   @@ -1847,6 +1910,10 @@ bool RISCVAsmParser::ParseDirective(AsmToken DirectiveID) {
>      return true;
>    }
>    
>   +bool RISCVAsmParser::shouldParseAsDirective(StringRef IDVal) {
>   +  return IDVal != ".insn";
>   +}
>   +
>
> Then with some extra machinery for printing opcode values etc, you can make definitions in RISCVInstrInfo.td like:
>
>   +//===----------------------------------------------------------------------===//
>   +// .insn directive
>   +//===----------------------------------------------------------------------===//
>   +
>   +class OpcodeVal<string name, bits<7> enc> {
>   +  string Name = name;
>   +  bits<7> Encoding = enc;
>   +}
>   +
>   +def OpcodeValList : GenericTable {
>   +  let FilterClass = "OpcodeVal";
>   +  let Fields = [ "Name", "Encoding" ];
>   +
>   +  let PrimaryKey = [ "Encoding" ];
>   +  let PrimaryKeyName = "lookupOpcodeValByEncoding";
>   +}
>   +
>   +def lookupOpcodeValByName : SearchIndex {
>   +  let Table = OpcodeValList;
>   +  let Key = [ "Name" ];
>   +}
>   +
>   +// These names match those accepted by the GNU assembler's .insn directive.
>   +def : OpcodeVal<"C0", 0x0>;
>   +def : OpcodeVal<"C1", 0x1>;
>   +def : OpcodeVal<"C2", 0x2>;
>   +def : OpcodeVal<"LOAD", 0x03>;
>   +def : OpcodeVal<"LOAD_FP", 0x07>;
>   +def : OpcodeVal<"CUSTOM_0", 0x0b>;
>   +def : OpcodeVal<"MISC_MEM", 0x0f>;
>   +def : OpcodeVal<"OP_IMM", 0x13>;
>   +def : OpcodeVal<"AUIPC", 0x17>;
>   +def : OpcodeVal<"OP_IMM_32", 0x1b>;
>   +def : OpcodeVal<"STORE", 0x23>;
>   +def : OpcodeVal<"STORE_FP", 0x27>;
>   +def : OpcodeVal<"CUSTOM_1", 0x2b>;
>   +def : OpcodeVal<"AMO", 0x2f>;
>   +def : OpcodeVal<"OP", 0x33>;
>   +def : OpcodeVal<"LUI", 0x37>;
>   +def : OpcodeVal<"OP_32", 0x3b>;
>   +def : OpcodeVal<"MADD", 0x43>;
>   +def : OpcodeVal<"MSUB", 0x47>;
>   +def : OpcodeVal<"NMADD", 0x4f>;
>   +def : OpcodeVal<"NMSUB", 0x4b>;
>   +def : OpcodeVal<"OP_FP", 0x53>;
>   +def : OpcodeVal<"CUSTOM_2", 0x5b>;
>   +def : OpcodeVal<"BRANCH", 0x63>;
>   +def : OpcodeVal<"JALR", 0x67>;
>   +def : OpcodeVal<"JAL", 0x6f>;
>   +def : OpcodeVal<"SYSTEM", 0x73>;
>   +def : OpcodeVal<"CUSTOM_3", 0x7b>;
>   +
>   +// Note that for all directive definitions, the passed constants (e.g. funct3,
>   +// funct3, opcode) are all dummy values that are overridden by the .insn
>   +// definition.
>   +
>   +let hasSideEffects = 1, mayLoad = 1, mayStore = 1,
>   +    isAsmParserOnly = 1 in {
>   +
>   +def INSN_R : RVInstR<0, 0, OPC_LOAD, (outs GPR:$rd), (ins opcode_value:$opcode,
>   +                     uimm3:$funct3, uimm7:$funct7, GPR:$rs1, GPR:$rs2),
>   +                     ".insn r", "$opcode, $funct3, $funct7, $rd, $rs1, $rs2">,
>   +             Sched<[]> {
>   +  bits<7> funct7;
>   +  bits<3> funct3;
>   +  bits<7> opcode;
>   +
>   +  let Inst{31-25} = funct7;
>   +  let Inst{14-12} = funct3;
>   +  let Inst{6-0} = opcode;
>   +}
>   +
>   +def INSN_I : RVInstI<0, OPC_LOAD, (outs GPR:$rd), (ins opcode_value:$opcode,
>   +                     uimm3:$funct3, GPR:$rs1, simm12:$imm12), ".insn i",
>   +                     "$opcode, $funct3, $rd, $rs1, $imm12">,
>   +      Sched<[]> {
>   +  bits<3> funct3;
>   +  bits<7> opcode;
>   +
>   +  let Inst{14-12} = funct3;
>   +  let Inst{6-0} = opcode;
>   +}
>
> I'm afraid the branch were I was playing with this got a bit messy so it's hard to extract a full patch, but hopefully that's enough to outline the idea (and I can spend more time pulling bits out if it's useful).

I started looking at this and found one issue immediately.

We ended up with this entry in the custom operand parser

  { 0 /* .insn */, 8 /* 3 */, MCK_SImm21Lsb0JAL, AMFBS_None },

This should be based on the format. Not just ".insn". I think this would cause the operand to fail parsing for any of the formats, but J.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108602/new/

https://reviews.llvm.org/D108602



More information about the llvm-commits mailing list