[PATCH] D14257: [AsmParser] Generalize matching for grammars without mnemonic-lead statements.

Mon Jan 4 16:10:31 PST 2016

I'm wondering whether its a good idea to revert the changes that added the
new ways to break tokens. Clearly many targets had to adapt to that since
you disconnected MnemonicContainsDot. There's at least one out of tree
target that was using that flag. Reverting will create churn on these out
of tree targets. I don't know that I have particularly big concerns with
that change so we may be able to leave that in. Chandler, thoughts?

As for the change the matching tables. I think the big issue here is that
you're trying to generalize a design that fundamentally only scales
reasonably well if it can rely on the binary searching for the initial
mnemonic to reduce the linear scan search space.

Does the Hexagon assembly language have a defined grammar that would lend
itself to a different kind of parser? Presumably for most instructions the
mnemonic is in the line somewhere? Can we come up with a way to locate this
mnemonic in the line and key the matching to that?

On Mon, Jan 4, 2016 at 3:41 PM, Colin LeMahieu <colinl at codeaurora.org>
wrote:

> colinl added a comment.
>
> Hey everyone, I have a revert up for review which undoes the changes in
> this three series of patches.  http://reviews.llvm.org/D15874
>
> In the mean time here's an expanded rationale for the changes that were
> made.
>
> A couple regressions:
> size_t to unsigned and string::find to StringRef::find due to merge
> issues.  Even though rebasing before committing was performed, the noise
> due to trying to maintain 80 column formatting both in source and it
> generated code complicated the merge and these two issues were missed.  My
> apologies, I'll try to be more careful.
>
> MatchTable size:
> The match table is an N x M array of match entries where N is the number
> of instructions and M is the number of operands for the instruction with
> the largest number of operands in the entire instruction set.  Instructions
> that use fewer operands than the longest instruction are value-initialized
> as unspecified array elements.  The issue is if the number of operands for
> the longest instruction changes, the entire array is resized to N x (M +
> 1).  By adding the initial mnemonic to the match table we hit this case
> which increased the binary size of the tools themselves.
>
> r256669 was created to address this issue
> An alternative I would have suggested in a code review for this patch
> would be to remove Mnemonic from OperandMatchEntry and rewrite the binary
> search that uses LessOpcodeOperand to use OperandMatchEntry.Classes[0]
> This should address the binary size issue without introducing the
> HasMnemonicFirst flag and still allowing instructions without a leading
> mnemonic.
>
> matchTokenString function size
> This function converts known strings to an enum value via a nested switch
> table.  By adding mnemonics to the generated match set we're indeed
> increasing the size of this function.  If we remove
> OperandMatchEntry.Mnemonic and use Classes[0] we'd be making use of these
> code paths, otherwise it's true we could optimize this and only add strings
> that were actually needed for target that have HasMnemonicFirst=0.
>
> Linear search
> Right now the match loop narrows down the match table by the first entry,
> which is assumed to be a string, and does a linear search within these
> entries to find a match.  When expanding to a generalized search the first
> impulse is to try and perform a binary search on matches though there are
> some issues.
>
> Some backends don't allow some combinations of operand fitting tests to be
> performed, they have asserts in some branches that aren't hit with a linear
> search but are hit with a binary search.  See
> ARMAsmParser::checkTargetMatchPredicate which has some asserts when certain
> checks don't pass despite the fact that the matcher loop is designed to
> survive match predicate failures with "if ((MatchResult =
> checkTargetMatchPredicate(Inst)) != Match_Success)"
>
> Some backends test specific error messages out of the assembler, see
> noneon-diagnostics.s.  Since the matcher loop tracks the last instruction
> tested and prints diagnostics for this instruction sometimes, these tests
> would need to be changed.
>
> The fundamental issue with binary searching the match entries has to do
> with the fact that we can't generate a total order for all operand fits.
> Consider the register classes
> ClassA{reg1, reg2}, ClassB{reg1, reg3}, ClassC{reg2, reg3}.  There's no
> way to sort these register classes such that valid entries are in a
> contiguous region which makes binary searching not work.  I think the ARM
> backend had register classes in this category that tripped up the matcher.
> Also consider the immediate classes s8, s16, u8, u16 when trying to match
> -1 and 257.  There's no way to sort these immediate classes such that
> there's a contiguous match region.
>
> If someone can propose an efficient way to do this searching I'd be
> interested, it'd be even more interesting if it didn't involve rewriting
> all the assembler backends.
>
>
> Repository:
>   rL LLVM
>
> http://reviews.llvm.org/D14257
>
>
>
>

-- 
~Craig
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160104/6a8e45ed/attachment.html>