[llvm-commits] [RFC] Towards an assembler parser for PowerPC

Mon Jan 21 05:30:42 PST 2013

Uli, lots of terrific work here!

On Fri, 2013-01-18 at 17:53 +0100, Ulrich Weigand wrote:

> - diff-llvm-asm-branch:  This patch cleans up the other branch patterns.
> Note that at this point, support for branch conditions is still missing; I
> need to investigate how to best model this.  (One way might be for the
> compiler to always generate the primary memnoics BC, and support the large
> variety of extended mnemonics only via the alias mechanism in the
> parser ...   Thoughts?)   For now, I've removed support for conditions from
> the "blr" pattern, where they would have never been used by current code
> anyway.  This allows the parser to at least understand a plain function
> return.

So current code never generates a bclr?  That's kind of sad, as there
are generally lots of opportunities to do so.  We'll need this
eventually, but it sounds like the alias approach is promising.

> 
> - diff-llvm-asm-preinc:  This is really a continuation of the previous
> patch, but broken out due to its size.  All the "store-with-update"
> patterns are currently encoded using two separate operands for address base
> and offset, instead of a single memory (memri/memrr) operand.  This
> confuses the assembler parser, since it tries to parse memory operands in
> the same way for all instructions.  Unfortunately, due to the way
> instruction selection works for store-with-update, we must present
> selection-DAG nodes with two operands for those patterns, and there doesn't
> appear to be a nice way to then combine them back into a single complex
> operand at the instruction level.  The ARM back-end works around this
> problem by first selecting pseudo instructions with two operands, and then
> using custom inserters to transform them into the single-operand versions.
> This patch introduces the same work-around for PowerPC.  (Any suggestions
> to implement this in a nicer way are welcome!)

Just a curiosity question:  Why isn't there a similar problem with
load-update forms?

I agree with Hal that it would be nice if this could be somehow fixed in
the parser common code, just to avoid all the extra patterns in the .td
files for at least two targets, but what you have here is clean and
understandable and is certainly sufficient if that's not easy to do.
> 
> - diff-llvm-asm-hacks:  After the above clean-ups, there still remain a
> small number of patterns in the back-end that throw off the parser.
> Usually, these are patterns to implement specialized versions of an
> already-supported general instruction (e.g. LDinto_toc or ADD8TLS) or
> patterns where the operands in the assembler line don't match the MI
> operands, e.g. due to dataflow issues (CRSET, V_SET0) or because some
> features like FPSCR are not fully modeled (MTFSF, MFCRpseud).   While all
> this probably ought to fully cleaned up at some point, for now I'm simply
> marking them as isCodeGenOnly so that they will be ignored by the parser.

For most of these, isCodeGenOnly doesn't seem to be limiting.  As you
pointed out, doing this for ADD8TLS means we can't assemble the
"$rB at tls" form of the operand.  Can that be recognized during processing
of the add to form an ADD8TLS, or does the order of processing prevent
that?  This isn't high priority but we probably want some sort of
solution eventually.

> 
> - diff-llvm-tblgen-regclass:  This is another patch to common code,
> addressing a very PowerPC specific problem.  The general "flow" of the
> assembler parser on other platforms is that common code will first attempt
> to classify all instruction operands into categories like "Register of
> class X", "Immediate", or "Memory address", and *then* go into the
> instruction table using the opcode and list of operand types to select the
> matching pattern.  This approach assumes that you *can* classify operands
> by just looking at them in isolation.  Unfortunately this is not true on
> PowerPC, since registers are specified in assembler source simply by their
> numbers, making them indistinguishable from small immediates.  For example,
> the "3" in "add 1, 2, 3" refers to a register while the "3" in "addi 1, 2,
> 3" refers to the immediate value.  This means that basically the whole
> generic mechanism of classifying operands into register classes does not
> work, and we need to instead define back-end matchers for all the register
> classes.  This is possible without common code changes by using
> RegisterOperand classes to specify a ParserMatchClass -- but this means
> that every instance of a register class in the .td files must be replaced
> by the corresponding RegisterOperand class, which seems a bit unnecessary
> to me, and may cause future maintainability hassles.  I'd propose to
> instead allow specifying a ParserMatchClass directly on the RegisterClass
> itself, which can be done by a minor change as implemented by this patch.
> [ But if folks prefer the RegisterOperand approach instead, I can certainly
> go back to that. ]

FWIW, I like your approach here.  It's more elegant, and more clearly
separates the function of the classes.

> - diff-llvm-asm-aliases:  This adds a little bit of support for parsing
> extended mnemonics; currently only for those that are emitted by LLVM
> itself (mr, sldi, srdi).   Needs to be extended to cover the full list of
> defined mnemonics at some point.  (Also, needs test cases.)

This is a nice clean approach, well done.

Thanks again for all the excellent work on this!