[llvm-commits] [RFC] Towards an assembler parser for PowerPC
Hal Finkel
hfinkel at anl.gov
Fri Jan 18 09:52:29 PST 2013
----- Original Message -----
> From: "Ulrich Weigand" <Ulrich.Weigand at de.ibm.com>
> To: llvm-commits at cs.uiuc.edu
> Cc: "Hal Finkel" <hfinkel at anl.gov>, "Roman Divacky" <rdivacky at freebsd.org>
> Sent: Friday, January 18, 2013 10:53:08 AM
> Subject: [RFC] Towards an assembler parser for PowerPC
>
>
> Hello,
>
> as there's been some ongoing interest in this topic, the appended
> patch set
> shows the current state of my attempts to enable the assembler parser
> for
> PowerPC.
Nice!
> At this point, I don't consider it ready for inclusion yet,
> but
> I'd like to solicit comments on the overall direction, and a couple
> of
> specific questions.
>
> With the patch set, LLVM builds and the test suite shows no
> regressions.
> We also build a working assembler parser that is able to at least
> correctly
> assemble and execute a "hello world" assembler file. However,
> support is
> far from complete; most notably, conditional branch (and isel)
> instructions
> cannot be parsed at this point.
>
> The first five patches prepare the PowerPC back-end by cleaning up
> instruction patterns to make them more usable for the parser; this
> means in
> particular removing duplicates (multiple patterns resulting in the
> same
> assembler instructions) and cleaning up irregularities (e.g. with
> address
> operands).
>
> The next two patches enhance the TableGen step generating the
> assembler
> parser to support features needed by PowerPC. (It would be feasible
> in
> principle to do without those, but at the cost of tedious work-around
> code
> in the PowerPC assembler parser back-end.)
>
> Then comes the patch that actually adds and enables the bare-bones
> assembler parser, followed by two more patches adding support for
> additional features (instruction aliases, relocation modifiers).
>
> The changes in some more detail:
>
> - diff-llvm-asm-calls: This patch cleans up call-related patterns.
> In
> particular, we currently have a duplicated set of patterns depending
> on the
> ABI to be followed (Darwin vs. Linux). This is a bit odd; while the
> different ABIs will result in different instruction sequences, the
> actual
> instructions themselves ought to be independent of the ABI. And in
> fact it
> turns out that the only nontrivial difference between the two sets of
> patterns is that in the PPC64 Linux ABI, the instruction used for
> indirect
> calls is marked to take X11 as extra input register (which is indeed
> used
> only with that ABI to hold an incoming environment pointer for nested
> functions). However, this does not need to be hard-coded at the .td
> pattern level; instead, the C++ code expanding calls can simply add
> that
> use, just like it adds uses for argument registers anyway.
>
> - diff-llvm-asm-branch: This patch cleans up the other branch
> patterns.
> Note that at this point, support for branch conditions is still
> missing; I
> need to investigate how to best model this. (One way might be for
> the
> compiler to always generate the primary memnoics BC, and support the
> large
> variety of extended mnemonics only via the alias mechanism in the
> parser ... Thoughts?)
Can't you just mark the other definitions as CodeGenOnly? It is nice to have the aliased memnoics in the assembly output -- it makes things more readable (IMHO).
> For now, I've removed support for
> conditions from
> the "blr" pattern, where they would have never been used by current
> code
> anyway. This allows the parser to at least understand a plain
> function
> return.
>
> - diff-llvm-asm-address: This patch cleans up a number of issues
> related
> to handling addresses. In particular, there are various instructions
> that
> are repeated in two patterns, one with an immediate operand and the
> other
> with a symbolic operand (ADDI / ADDIL; LD / LDrs; ...). This
> confuses the
> parser, and isn't really necessary. For ADDI/ADDIL, the "symbol"
> versions
> actually already also understand immediate operands just fine; and
> for
> LD/LDrs we can remove the need for a "memrs" operand type by making
> "memrix" accept iPTR instead of i32 displacements. The patch also
> fixes
> the LHAU8 pattern which for some reason, unlike all the other
> load-with-update patterns, uses a broken-out operand form
> "$disp($rA)"
> instead of a memri operand type.
>
> - diff-llvm-asm-preinc: This is really a continuation of the
> previous
> patch, but broken out due to its size. All the "store-with-update"
> patterns are currently encoded using two separate operands for
> address base
> and offset, instead of a single memory (memri/memrr) operand. This
> confuses the assembler parser, since it tries to parse memory
> operands in
> the same way for all instructions.
It sounds like this is what needs to be changed. Are there any major impediments to making the parser handle these in a different way?
> Unfortunately, due to the way
> instruction selection works for store-with-update, we must present
> selection-DAG nodes with two operands for those patterns, and there
> doesn't
> appear to be a nice way to then combine them back into a single
> complex
> operand at the instruction level. The ARM back-end works around this
> problem by first selecting pseudo instructions with two operands, and
> then
> using custom inserters to transform them into the single-operand
> versions.
Erk ;)
> This patch introduces the same work-around for PowerPC. (Any
> suggestions
> to implement this in a nicer way are welcome!)
>
> - diff-llvm-asm-hacks: After the above clean-ups, there still remain
> a
> small number of patterns in the back-end that throw off the parser.
> Usually, these are patterns to implement specialized versions of an
> already-supported general instruction (e.g. LDinto_toc or ADD8TLS) or
> patterns where the operands in the assembler line don't match the MI
> operands, e.g. due to dataflow issues (CRSET, V_SET0) or because some
> features like FPSCR are not fully modeled (MTFSF, MFCRpseud). While
> all
> this probably ought to fully cleaned up at some point, for now I'm
> simply
> marking them as isCodeGenOnly so that they will be ignored by the
> parser.
Sounds reasonable.
>
> - diff-llvm-tblgen-tie: This is the first patch to common code. It
> tries
> to address the problem that the parser generator doen't correctly
> handle
> register constraints that tie a single operand to one component of a
> complex operand (as happens on PowerPC in the pre-inc patterns:
> RegConstraint<"$addr.reg = $ea_result">). While this appears
> difficult to
> fix in the general case, this patch at least adds support in the case
> where
> the complex operand can be handled piece-wise by the assembler parser
> (i.e.
> there is no ParserMatchClass defined on the complex operand itself);
> this
> suffices to let me handle the PowerPC case without ugly work-around
> in the
> back-end. Unfortunately, the ARM back-end also uses register
> constraints
> of that type, even though they weren't correctly supported. They
> work
> around this problem by using back-end defined AsmMatchConverter
> routines to
> fix up the operands for the affected instructions. However, those
> converters seem to be *missing* for some patterns that look like
> they'd
> really need one ... With current code, this leads to (what looks to
> me
> like) incorrect encodings for those instructions with the assembler
> parser.
> But with my patch applied, we get instead build errors pointing out
> that
> something's wrong. To get LLVM to at least still build, the patch
> adds
> converters to those instructions (but I haven't really verified that
> the
> particular converters are really correct, this probably needs looking
> into
> by ARM back-end folks). [ Note that using my patch, it might be
> possible
> to avoid all those converters in the ARM back-end too, by allowing
> the
> parser to handle complex operands piece-wise. This might involve
> some
> refactoring, though. ]
>
> - diff-llvm-tblgen-regclass: This is another patch to common code,
> addressing a very PowerPC specific problem. The general "flow" of
> the
> assembler parser on other platforms is that common code will first
> attempt
> to classify all instruction operands into categories like "Register
> of
> class X", "Immediate", or "Memory address", and *then* go into the
> instruction table using the opcode and list of operand types to
> select the
> matching pattern. This approach assumes that you *can* classify
> operands
> by just looking at them in isolation. Unfortunately this is not true
> on
> PowerPC, since registers are specified in assembler source simply by
> their
> numbers, making them indistinguishable from small immediates. For
> example,
> the "3" in "add 1, 2, 3" refers to a register while the "3" in "addi
> 1, 2,
> 3" refers to the immediate value. This means that basically the
> whole
> generic mechanism of classifying operands into register classes does
> not
> work, and we need to instead define back-end matchers for all the
> register
> classes. This is possible without common code changes by using
> RegisterOperand classes to specify a ParserMatchClass -- but this
> means
> that every instance of a register class in the .td files must be
> replaced
> by the corresponding RegisterOperand class, which seems a bit
> unnecessary
> to me, and may cause future maintainability hassles. I'd propose to
> instead allow specifying a ParserMatchClass directly on the
> RegisterClass
> itself, which can be done by a minor change as implemented by this
> patch.
> [ But if folks prefer the RegisterOperand approach instead, I can
> certainly
> go back to that. ]
>
> - diff-llvm-asm-parser: This is that actual core of the assembler
> parser.
> It's a bit bare-bones at this stage, and doesn't do a lot in the area
> of
> error handling or special features. Also, there's no test cases yet.
>
> - diff-llvm-asm-aliases: This adds a little bit of support for
> parsing
> extended mnemonics; currently only for those that are emitted by LLVM
> itself (mr, sldi, srdi). Needs to be extended to cover the full
> list of
> defined mnemonics at some point. (Also, needs test cases.)
>
> - diff-llvm-asm-variants: This adds some of the PowerPC-specific
> symbol
> modifiers (like "@toc at ha" etc) to
> MCSymbolRefExpr::getVariantKindForName.
> Needs to be completed (and test cases). B.t.w. why is this handled
> in
> common code? Shouldn't there be some call-out to the back-end for
> this?
>
>
> Apart from the issues mentioned above, there is one other problem I'm
> not
> sure how to handle. The back-end currently has a large number of
> duplicated patterns allowing use of the sam instruction with either
> 32-bit
> or 64-bit operands. For example, we have:
>
> def OR : XForm_6<31, 444, (outs GPRC:$rA), (ins GPRC:$rS,
> GPRC:$rB),
> "or $rA, $rS, $rB", IntSimple,
> [(set GPRC:$rA, (or GPRC:$rS, GPRC:$rB))]>;
> def OR8 : XForm_6<31, 444, (outs G8RC:$rA), (ins G8RC:$rS,
> G8RC:$rB),
> "or $rA, $rS, $rB", IntSimple,
> [(set G8RC:$rA, (or G8RC:$rS, G8RC:$rB))]>;
>
> Now, if the assembler parser sees input like "or 1, 2, 3", it cannot
> decide
> whether to create an OR or OR8 MI pattern, so it will just use
> whatever
> comes first in the list. When running in assembler mode, this
> doesn't
> seem to matter much since they will both encode to the same binary
> code as
> well, but I guess there could be other users of the MI generated by
> the
> parser where this might indeed make a difference ... [ As an aside,
> is
> this large amount of duplicated info really necessary in the first
> place?
> There ought to be a better way ... ] Any thoughts on how to handle
> this
> welcome!
As I mentioned on IRC (and I understand that this would be a pain to try), my suggestion is to eliminate the 32-bit patterns (and then rename the 64-bit patterns). So long as we don't add i64 to any register class in PPCISelLowering in 32-bit mode, then 64-bit ops should still be split during legalization (I think?). We might need to add a bunch of Pat<> definitions to match the 32-bit operand cases. Does anyone know of a reason why this would not work?
Thanks again,
Hal
>
>
> For now, my next steps towards getting the asm parser completed will
> be:
>
> - Complete the back-end .td files cleanups (e.g. support for
> conditional
> branches) and get them upstream.
>
> - Further discuss the TableGen common code changes, and in particular
> the
> ARM issuses mentioned above, and either get the changes upstream or
> agree
> on using back-end work-arounds.
>
> - Complete missing features in the parser itself, add test cases, and
> run
> large-scale tests (e.g. building all of test-suite using the
> asm-parser as
> assembler).
>
> Anything else I'm missing?
>
>
> Bye,
> Ulrich
>
> (See attached file: diff-llvm-asm-address)(See attached file:
> diff-llvm-asm-aliases)(See attached file: diff-llvm-asm-branches)(See
> attached file: diff-llvm-asm-calls)(See attached file:
> diff-llvm-asm-hacks)
> (See attached file: diff-llvm-asm-parser)(See attached file:
> diff-llvm-asm-preinc)(See attached file: diff-llvm-asm-variants)(See
> attached file: diff-llvm-tblgen-regclass)(See attached file:
> diff-llvm-tblgen-tie)(See attached file: series)
--
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory
More information about the llvm-commits
mailing list