[llvm-commits] [llvm] r78378 - in /llvm/trunk: lib/Target/X86/AsmParser/X86AsmParser.cpp test/MC/AsmParser/x86_instructions.s utils/TableGen/AsmMatcherEmitter.cpp

Fri Aug 7 13:33:33 PDT 2009

On Fri, Aug 7, 2009 at 10:34 AM, Chris Lattner<clattner at apple.com> wrote:
> On Aug 7, 2009, at 1:26 AM, Daniel Dunbar wrote:
>> 2. Separate the matching of operands to an instruction from the
>> construction of
>>    the MCInst. In theory this can be done during matching, but since
>> the number
>>    of variations is small I think it makes sense to decompose the
>> problems.
>
> We found this to be a pretty big code size win for dagisel stuff.

Right, that was my thought as well. Any choice should perform well
enough once matching is fixed, so it makes sense to optimize for
simplicity & code size.

> Should test/MC/AsmParser/x86_instructions.s use filecheck?

No, it uses patterns in a few cases, at least until MCStreamer prints
out .s files.

>> +++ llvm/trunk/lib/Target/X86/AsmParser/X86AsmParser.cpp Fri Aug  7
>> 03:26:05 2009
>>
>> +  bool isToken(const StringRef &Str) const {
>> +    return Kind == Token && Str == getToken();
>> +  }
>
> A common case in the generated code is isToken("foo").  Maybe it would
> make sense to introduce a new isTokenStr templated method so that
> strlen isn't called?

This problem should magically go away once matching is done.

>> +bool X86ATTAsmParser::ParseInstruction(const StringRef &Name,
>> MCInst &Inst) {
>> -  SmallVector<X86Operand, 3> Operands;
>> +  SmallVector<X86Operand, 4> Operands;
>
> Feel free to go crazy and bump this up to 8 or something. :)

Sure.

>> +// Some example inputs, for X86:
>> +//   'addl' (immediate ...) (register ...)
>> +//   'add' (immediate ...) (memory ...)
>> +//   'call' '*' %epc
>
> this makes a lot of sense to me.  X86ATTAsmParser::ParseOperand will
> have to change a bit though to handle the '*' as its own X86Operand.

Right, that should be trivial though.

>> +//  - It may depend on the subtarget or the assembler context.
>> Instructions
>> +//    which are invalid for the current mode, but otherwise
>> unambiguous (e.g.,
>> +//    an SSE instruction in a file being assembled for i486) should
>> be accepted
>> +//    and rejected by the assembler front end. However, if the
>> proper encoding
>> +//    for an instruction is dependent on the assembler context then
>> the matcher
>> +//    is responsible for selecting the correct machine instruction
>> for the
>> +//    current mode.
>
> Do we really need to support "pretend i386 doesn't have sse
> instructions"?  Do you expect a lack of this to cause a problem in
> practice?

Well, for SSE the right approach is to match the instruction, but then
reject it in the front end. However, this approach doesn't work if the
correct selection depends on the assembler mode. This almost certainly
does occur in some cases, for example x86-32 vs. x86-64.

>> +// The matching is divided into two distinct phases:
>> +//
>> +//   1. Classification: Each operand is mapped to the unique set
>> which (a)
>> +//      contains it, and (b) is the largest such subset for which a
>> single
>> +//      instruction could match all members.
>> +//
>> +//      For register classes, we can generate these subgroups
>> automatically. For
>> +//      arbitrary operands, we expect the user to define the
>> classes and their
>> +//      relations to one another (for example, 8-bit signed
>> immediates as a
>> +//      subset of 32-bit immediates).
>> +//
>> +//      By partitioning the operands in this way, we guarantee that
>> for any
>> +//      tuple of classes, any single instruction must match either
>> all or none
>> +//      of the sets of operands which could classify to that tuple.
>> +//
>> +//      In addition, the subset relation amongst classes induces a
>> partial order
>> +//      on such tuples, which we use to resolve ambiguities.
>> +//
>
>> +//      FIXME: What do we do if a crazy case shows up where this is
>> the wrong
>> +//      resolution?
>
> Allow a c++ hack to fix it? :)  Another option is to allow explicit
> "priority" setting on an instruction pattern.  We do this with
> "addedcomplexity" for isel matching when the normal heuristic isn't
> what we want.

Yes, that is more or less what I am thinking. The .td file can
probably introduce additional operand classes and specify the ordering
on them in most cases where it needs to resolve an ambiguity. We'll
see what the actual problems are once more stuff starts working...

>> +namespace {
>> +  cl::opt<std::string>
>> +  MatchOneInstr("match-one-instr", cl::desc("Match only the named
>> instruction"),
>> +              cl::init(""));
>> +}
>
> Please just use 'static'.

Done.

>> /// FlattenVariants - Flatten an .td file assembly string by
>> selecting the
>> /// variant at index \arg N.
>> static std::string FlattenVariants(const std::string &AsmString,
>
> Can this be shared with the asmwriter emitter?

Probably. For some reason I just wanted to reimplement it. :)

>> /// TokenizeAsmString - Tokenize a simplified assembly string.
>> +static void TokenizeAsmString(const StringRef &AsmString,
>>                               SmallVectorImpl<StringRef> &Tokens) {
>
> as you know, this is X86 specific.  When we bring up the next target,
> I wonder if this info could be moved to the .td files, so that the
> X86AsmLexer and the tblgen parser could be driven by the same point of
> truth.

This actually isn't X86 specific, it happens to tokenize ARM and PPC
as well, relatively correctly I believe. For now I'm not worried about
it, but I agree that we should eventually move the representation in
the .td file to be more explicit.

>> +static bool IsAssemblerInstruction(const StringRef &Name,
>> +                                   const CodeGenInstruction &CGI,
>> +                                   const SmallVectorImpl<StringRef>
>> &Tokens) {
>> +  // Ignore psuedo ops.
>> +  //
>> +  // FIXME: This is a hack.
>> +  if (const RecordVal *Form = CGI.TheDef->getValue("Form"))
>> +    if (Form->getValue()->getAsString() == "Pseudo")
>> +      return false;
>> +
>> +  // Ignore "PHI" node.
>> +  //
>> +  // FIXME: This is also a hack.
>> +  if (Name == "PHI")
>> +    return false;
>
> We should probably add a target-independent "isPseudoOp" flag to
> the .td files to handle both of these cases, along with things like
> INLINEASM, EH_LABEL, etc.

Yes. A number of the FIXMEs in this file are things which I think
should eventually be resolved by making the .td file a bit richer.

> For warnings, how about using:
>
> SrcMgr.PrintMessage(CGI.TheDef->getLoc(), "whatever", "warning");
>
> It would be nice to add a "PrintWarning" helper like we have
> "PrintError".

Well, for now this are hidden under debug because I expect to get
warnings for things. Once the ambiguity resolution is in place and
other FIXMEs are resolved most of these instances turn into errors.
IMHO we don't really want .td files to spit out warnings during a
build, so it makes a little bit of sense to only have errors.

>> +namespace {
>>
>> +struct OperandListLess {
>> +  bool operator()(const
>> +                  std::pair<const CodeGenInstruction::OperandInfo*,
>> unsigned> &
>
> probably worth using a typedef for this crazy long name to avoid
> wrapping so much :)

I meant to remove this class, actually. Fixed.

>
>> +void AsmMatcherEmitter::run(raw_ostream &OS) {
>> +  CodeGenTarget Target;
>> +  const std::vector<CodeGenRegister> &Registers =
>> Target.getRegisters();
>> +  Record *AsmParser = Target.getAsmParser();
>> +  std::string ClassName = AsmParser-
>> >getValueAsString("AsmParserClassName");
>> +
>> +  std::string Namespace = Registers[0].TheDef-
>> >getValueAsString("Namespace");
>>
>> -  // Generate the top level match function.
>> +  EmitSourceFileHeader("Assembly Matcher Source Fragment", OS);
>> +
>> +  // Emit the function to match a register name to number.
>>
>>   OS << "bool " << Target.getName() << ClassName
>> -     << "::MatchInstruction(const StringRef &Name, "
>> +     << "::MatchRegisterName(const StringRef &Name, unsigned
>> &RegNo) {\n";
>> +
>> +  // FIXME: TableGen should have a fast string matcher generator.
>> +  for (unsigned i = 0, e = Registers.size(); i != e; ++i) {
>> +    const CodeGenRegister &Reg = Registers[i];
>> +    if (Reg.TheDef->getValueAsString("AsmName").empty())
>> +      continue;
>> +
>> +    OS << "  if (Name == \""
>> +       << Reg.TheDef->getValueAsString("AsmName") << "\")\n"
>> +       << "    return RegNo=" << i + 1 << ", false;\n";
>> +  }
>> +  OS << "  return true;\n";
>> +  OS << "}\n\n";
>
> Please split these out into "EmitRegisterNameMatcher()",
> "EmitFooMatcher()" type of functions, just for structure.

Will do.

 - Daniel