[llvm-commits] [llvm] r78378 - in /llvm/trunk: lib/Target/X86/AsmParser/X86AsmParser.cpp test/MC/AsmParser/x86_instructions.s utils/TableGen/AsmMatcherEmitter.cpp
Chris Lattner
clattner at apple.com
Fri Aug 7 10:34:57 PDT 2009
On Aug 7, 2009, at 1:26 AM, Daniel Dunbar wrote:
> URL: http://llvm.org/viewvc/llvm-project?rev=78378&view=rev
> Log:
> llvm-mc/AsmMatcher: Move to a slightly more sane matching design.
> - Still not very sane, but a least its not 60k lines on X86. :)
Looks like a great step to me! :)
> 2. Separate the matching of operands to an instruction from the
> construction of
> the MCInst. In theory this can be done during matching, but since
> the number
> of variations is small I think it makes sense to decompose the
> problems.
We found this to be a pretty big code size win for dagisel stuff.
> 4. The comment at the top of AsmMatcherEmitter.cpp explains the
> approach I'm
> moving towards for handling ambiguous instructions. The high-bit
> is to infer
> a partial ordering of the operand classes (and force the user to
> specify one
> if we can't) and use that to resolve ambiguities.
Ok.
Should test/MC/AsmParser/x86_instructions.s use filecheck?
> +++ llvm/trunk/lib/Target/X86/AsmParser/X86AsmParser.cpp Fri Aug 7
> 03:26:05 2009
>
> + bool isToken(const StringRef &Str) const {
> + return Kind == Token && Str == getToken();
> + }
A common case in the generated code is isToken("foo"). Maybe it would
make sense to introduce a new isTokenStr templated method so that
strlen isn't called?
> +bool X86ATTAsmParser::ParseInstruction(const StringRef &Name,
> MCInst &Inst) {
> - SmallVector<X86Operand, 3> Operands;
> + SmallVector<X86Operand, 4> Operands;
Feel free to go crazy and bump this up to 8 or something. :)
> +++ llvm/trunk/utils/TableGen/AsmMatcherEmitter.cpp Fri Aug 7
> 03:26:05 2009
> @@ -10,17 +10,87 @@
> // This tablegen backend emits a target specifier matcher for
> converting parsed
> // assembly operands in the MCInst structures.
> //
> +// The input to the target specific matcher is a list of literal
> tokens and
> +// operands. The target specific parser should generally eliminate
> any syntax
> +// which is not relevant for matching; for example, comma tokens
> should have
> +// already been consumed and eliminated by the parser. Most
> instructions will
> +// end up with a single literal token (the instruction name) and
> some number of
> +// operands.
> +//
> +// Some example inputs, for X86:
> +// 'addl' (immediate ...) (register ...)
> +// 'add' (immediate ...) (memory ...)
> +// 'call' '*' %epc
this makes a lot of sense to me. X86ATTAsmParser::ParseOperand will
have to change a bit though to handle the '*' as its own X86Operand.
> +// - It may depend on the subtarget or the assembler context.
> Instructions
> +// which are invalid for the current mode, but otherwise
> unambiguous (e.g.,
> +// an SSE instruction in a file being assembled for i486) should
> be accepted
> +// and rejected by the assembler front end. However, if the
> proper encoding
> +// for an instruction is dependent on the assembler context then
> the matcher
> +// is responsible for selecting the correct machine instruction
> for the
> +// current mode.
Do we really need to support "pretend i386 doesn't have sse
instructions"? Do you expect a lack of this to cause a problem in
practice?
> +// The matching is divided into two distinct phases:
> +//
> +// 1. Classification: Each operand is mapped to the unique set
> which (a)
> +// contains it, and (b) is the largest such subset for which a
> single
> +// instruction could match all members.
> +//
> +// For register classes, we can generate these subgroups
> automatically. For
> +// arbitrary operands, we expect the user to define the
> classes and their
> +// relations to one another (for example, 8-bit signed
> immediates as a
> +// subset of 32-bit immediates).
> +//
> +// By partitioning the operands in this way, we guarantee that
> for any
> +// tuple of classes, any single instruction must match either
> all or none
> +// of the sets of operands which could classify to that tuple.
> +//
> +// In addition, the subset relation amongst classes induces a
> partial order
> +// on such tuples, which we use to resolve ambiguities.
> +//
> +// FIXME: What do we do if a crazy case shows up where this is
> the wrong
> +// resolution?
Allow a c++ hack to fix it? :) Another option is to allow explicit
"priority" setting on an instruction pattern. We do this with
"addedcomplexity" for isel matching when the normal heuristic isn't
what we want.
> +namespace {
> + cl::opt<std::string>
> + MatchOneInstr("match-one-instr", cl::desc("Match only the named
> instruction"),
> + cl::init(""));
> +}
Please just use 'static'.
> /// FlattenVariants - Flatten an .td file assembly string by
> selecting the
> /// variant at index \arg N.
> static std::string FlattenVariants(const std::string &AsmString,
Can this be shared with the asmwriter emitter?
> /// TokenizeAsmString - Tokenize a simplified assembly string.
> +static void TokenizeAsmString(const StringRef &AsmString,
> SmallVectorImpl<StringRef> &Tokens) {
as you know, this is X86 specific. When we bring up the next target,
I wonder if this info could be moved to the .td files, so that the
X86AsmLexer and the tblgen parser could be driven by the same point of
truth.
> +static bool IsAssemblerInstruction(const StringRef &Name,
> + const CodeGenInstruction &CGI,
> + const SmallVectorImpl<StringRef>
> &Tokens) {
> + // Ignore psuedo ops.
> + //
> + // FIXME: This is a hack.
> + if (const RecordVal *Form = CGI.TheDef->getValue("Form"))
> + if (Form->getValue()->getAsString() == "Pseudo")
> + return false;
> +
> + // Ignore "PHI" node.
> + //
> + // FIXME: This is also a hack.
> + if (Name == "PHI")
> + return false;
We should probably add a target-independent "isPseudoOp" flag to
the .td files to handle both of these cases, along with things like
INLINEASM, EH_LABEL, etc.
> +
> + // Ignore instructions with no .s string.
> + //
> + // FIXME: What are these?
> + if (CGI.AsmString.empty())
> + return false;
pseudoops :)
> + // FIXME: Hack; ignore any instructions with a newline in them.
> + if (std::find(CGI.AsmString.begin(),
> + CGI.AsmString.end(), '\n') != CGI.AsmString.end())
> + return false;
Seems reasonable for now. :)
> + // Ignore instructions with attributes, these are always fake
> instructions for
> + // simplifying codegen.
> + //
> + // FIXME: Is this true?
> + //
> + // Also, we ignore instructions which reference the operand
> multiple times;
> + // this implies a constraint we would not currently honor. These
> are
> + // currently always fake instructions for simplifying codegen.
> + //
> + // FIXME: Encode this assumption in the .td, so we can error out
> here.
> + std::set<std::string> OperandNames;
> + for (unsigned i = 1, e = Tokens.size(); i < e; ++i) {
> + if (Tokens[i][0] == '$' &&
> + std::find(Tokens[i].begin(),
> + Tokens[i].end(), ':') != Tokens[i].end()) {
> + DEBUG({
> + errs() << "warning: '" << Name << "': "
> + << "ignoring instruction; operand with attribute '"
> + << Tokens[i] << "', \n";
> + });
For warnings, how about using:
SrcMgr.PrintMessage(CGI.TheDef->getLoc(), "whatever", "warning");
It would be nice to add a "PrintWarning" helper like we have
"PrintError".
> +namespace {
>
> +struct OperandListLess {
> + bool operator()(const
> + std::pair<const CodeGenInstruction::OperandInfo*,
> unsigned> &
probably worth using a typedef for this crazy long name to avoid
wrapping so much :)
> +void AsmMatcherEmitter::run(raw_ostream &OS) {
> + CodeGenTarget Target;
> + const std::vector<CodeGenRegister> &Registers =
> Target.getRegisters();
> + Record *AsmParser = Target.getAsmParser();
> + std::string ClassName = AsmParser-
> >getValueAsString("AsmParserClassName");
> +
> + std::string Namespace = Registers[0].TheDef-
> >getValueAsString("Namespace");
>
> - // Generate the top level match function.
> + EmitSourceFileHeader("Assembly Matcher Source Fragment", OS);
> +
> + // Emit the function to match a register name to number.
>
> OS << "bool " << Target.getName() << ClassName
> - << "::MatchInstruction(const StringRef &Name, "
> + << "::MatchRegisterName(const StringRef &Name, unsigned
> &RegNo) {\n";
> +
> + // FIXME: TableGen should have a fast string matcher generator.
> + for (unsigned i = 0, e = Registers.size(); i != e; ++i) {
> + const CodeGenRegister &Reg = Registers[i];
> + if (Reg.TheDef->getValueAsString("AsmName").empty())
> + continue;
> +
> + OS << " if (Name == \""
> + << Reg.TheDef->getValueAsString("AsmName") << "\")\n"
> + << " return RegNo=" << i + 1 << ", false;\n";
> + }
> + OS << " return true;\n";
> + OS << "}\n\n";
Please split these out into "EmitRegisterNameMatcher()",
"EmitFooMatcher()" type of functions, just for structure.
Overall, this is great work Daniel!
-Chris
More information about the llvm-commits
mailing list