[llvm-commits] [RFC] Towards an assembler parser for PowerPC

Fri Jan 18 08:53:08 PST 2013

Hello,

as there's been some ongoing interest in this topic, the appended patch set
shows the current state of my attempts to enable the assembler parser for
PowerPC.  At this point, I don't consider it ready for inclusion yet, but
I'd like to solicit comments on the overall direction, and a couple of
specific questions.

With the patch set, LLVM builds and the test suite shows no regressions.
We also build a working assembler parser that is able to at least correctly
assemble and execute a "hello world" assembler file.  However, support is
far from complete; most notably, conditional branch (and isel) instructions
cannot be parsed at this point.

The first five patches prepare the PowerPC back-end by cleaning up
instruction patterns to make them more usable for the parser; this means in
particular removing duplicates (multiple patterns resulting in the same
assembler instructions) and cleaning up irregularities (e.g. with address
operands).

The next two patches enhance the TableGen step generating the assembler
parser to support features needed by PowerPC.  (It would be feasible in
principle to do without those, but at the cost of tedious work-around code
in the PowerPC assembler parser back-end.)

Then comes the patch that actually adds and enables the bare-bones
assembler parser, followed by two more patches adding support for
additional features (instruction aliases, relocation modifiers).

The changes in some more detail:

- diff-llvm-asm-calls:  This patch cleans up call-related patterns.  In
particular, we currently have a duplicated set of patterns depending on the
ABI to be followed (Darwin vs. Linux).  This is a bit odd; while the
different ABIs will result in different instruction sequences, the actual
instructions themselves ought to be independent of the ABI.  And in fact it
turns out that the only nontrivial difference between the two sets of
patterns is that in the PPC64 Linux ABI, the instruction used for indirect
calls is marked to take X11 as extra input register (which is indeed used
only with that ABI to hold an incoming environment pointer for nested
functions).  However, this does not need to be hard-coded at the .td
pattern level; instead, the C++ code expanding calls can simply add that
use, just like it adds uses for argument registers anyway.

- diff-llvm-asm-branch:  This patch cleans up the other branch patterns.
Note that at this point, support for branch conditions is still missing; I
need to investigate how to best model this.  (One way might be for the
compiler to always generate the primary memnoics BC, and support the large
variety of extended mnemonics only via the alias mechanism in the
parser ...   Thoughts?)   For now, I've removed support for conditions from
the "blr" pattern, where they would have never been used by current code
anyway.  This allows the parser to at least understand a plain function
return.

- diff-llvm-asm-address:  This patch cleans up a number of issues related
to handling addresses.  In particular, there are various instructions that
are repeated in two patterns, one with an immediate operand and the other
with a symbolic operand  (ADDI / ADDIL; LD / LDrs; ...).  This confuses the
parser, and isn't really necessary.  For ADDI/ADDIL, the "symbol" versions
actually already also understand immediate operands just fine; and for
LD/LDrs we can remove the need for a "memrs" operand type by making
"memrix" accept iPTR instead of i32 displacements.  The patch also fixes
the LHAU8 pattern which for some reason, unlike all the other
load-with-update patterns, uses a broken-out operand form "$disp($rA)"
instead of a memri operand type.

- diff-llvm-asm-preinc:  This is really a continuation of the previous
patch, but broken out due to its size.  All the "store-with-update"
patterns are currently encoded using two separate operands for address base
and offset, instead of a single memory (memri/memrr) operand.  This
confuses the assembler parser, since it tries to parse memory operands in
the same way for all instructions.  Unfortunately, due to the way
instruction selection works for store-with-update, we must present
selection-DAG nodes with two operands for those patterns, and there doesn't
appear to be a nice way to then combine them back into a single complex
operand at the instruction level.  The ARM back-end works around this
problem by first selecting pseudo instructions with two operands, and then
using custom inserters to transform them into the single-operand versions.
This patch introduces the same work-around for PowerPC.  (Any suggestions
to implement this in a nicer way are welcome!)

- diff-llvm-asm-hacks:  After the above clean-ups, there still remain a
small number of patterns in the back-end that throw off the parser.
Usually, these are patterns to implement specialized versions of an
already-supported general instruction (e.g. LDinto_toc or ADD8TLS) or
patterns where the operands in the assembler line don't match the MI
operands, e.g. due to dataflow issues (CRSET, V_SET0) or because some
features like FPSCR are not fully modeled (MTFSF, MFCRpseud).   While all
this probably ought to fully cleaned up at some point, for now I'm simply
marking them as isCodeGenOnly so that they will be ignored by the parser.

- diff-llvm-tblgen-tie: This is the first patch to common code.  It tries
to address the problem that the parser generator doen't correctly handle
register constraints that tie a single operand to one component of a
complex operand (as happens on PowerPC in the pre-inc patterns:
RegConstraint<"$addr.reg = $ea_result">).  While this appears difficult to
fix in the general case, this patch at least adds support in the case where
the complex operand can be handled piece-wise by the assembler parser (i.e.
there is no ParserMatchClass defined on the complex operand itself); this
suffices to let me handle the PowerPC case without ugly work-around in the
back-end.  Unfortunately, the ARM back-end also uses register constraints
of that type, even though they weren't correctly supported.  They work
around this problem by using back-end defined AsmMatchConverter routines to
fix up the operands for the affected instructions.  However, those
converters seem to be *missing* for some patterns that look like they'd
really need one ...   With current code, this leads to (what looks to me
like) incorrect encodings for those instructions with the assembler parser.
But with my patch applied, we get instead build errors pointing out that
something's wrong.  To get LLVM to at least still build, the patch adds
converters to those instructions (but I haven't really verified that the
particular converters are really correct, this probably needs looking into
by ARM back-end folks).   [ Note that using my patch, it might be possible
to avoid all those converters in the ARM back-end too, by allowing the
parser to handle complex operands piece-wise.  This might involve some
refactoring, though. ]

- diff-llvm-tblgen-regclass:  This is another patch to common code,
addressing a very PowerPC specific problem.  The general "flow" of the
assembler parser on other platforms is that common code will first attempt
to classify all instruction operands into categories like "Register of
class X", "Immediate", or "Memory address", and *then* go into the
instruction table using the opcode and list of operand types to select the
matching pattern.  This approach assumes that you *can* classify operands
by just looking at them in isolation.  Unfortunately this is not true on
PowerPC, since registers are specified in assembler source simply by their
numbers, making them indistinguishable from small immediates.  For example,
the "3" in "add 1, 2, 3" refers to a register while the "3" in "addi 1, 2,
3" refers to the immediate value.  This means that basically the whole
generic mechanism of classifying operands into register classes does not
work, and we need to instead define back-end matchers for all the register
classes.  This is possible without common code changes by using
RegisterOperand classes to specify a ParserMatchClass -- but this means
that every instance of a register class in the .td files must be replaced
by the corresponding RegisterOperand class, which seems a bit unnecessary
to me, and may cause future maintainability hassles.  I'd propose to
instead allow specifying a ParserMatchClass directly on the RegisterClass
itself, which can be done by a minor change as implemented by this patch.
[ But if folks prefer the RegisterOperand approach instead, I can certainly
go back to that. ]

- diff-llvm-asm-parser:  This is that actual core of the assembler parser.
It's a bit bare-bones at this stage, and doesn't do a lot in the area of
error handling or special features.  Also, there's no test cases yet.

- diff-llvm-asm-aliases:  This adds a little bit of support for parsing
extended mnemonics; currently only for those that are emitted by LLVM
itself (mr, sldi, srdi).   Needs to be extended to cover the full list of
defined mnemonics at some point.  (Also, needs test cases.)

- diff-llvm-asm-variants:  This adds some of the PowerPC-specific symbol
modifiers (like "@toc at ha" etc) to MCSymbolRefExpr::getVariantKindForName.
Needs to be completed (and test cases).  B.t.w. why is this handled in
common code?  Shouldn't there be some call-out to the back-end for this?

Apart from the issues mentioned above, there is one other problem I'm not
sure how to handle.  The back-end currently has a large number of
duplicated patterns allowing use of the sam instruction with either 32-bit
or 64-bit operands.  For example, we have:

def OR   : XForm_6<31, 444, (outs GPRC:$rA), (ins GPRC:$rS, GPRC:$rB),
                   "or $rA, $rS, $rB", IntSimple,
                   [(set GPRC:$rA, (or GPRC:$rS, GPRC:$rB))]>;
def OR8  : XForm_6<31, 444, (outs G8RC:$rA), (ins G8RC:$rS, G8RC:$rB),
                   "or $rA, $rS, $rB", IntSimple,
                   [(set G8RC:$rA, (or G8RC:$rS, G8RC:$rB))]>;

Now, if the assembler parser sees input like "or 1, 2, 3", it cannot decide
whether to create an OR or OR8 MI pattern, so it will just use whatever
comes first in the list.   When running in assembler mode, this doesn't
seem to matter much since they will both encode to the same binary code as
well, but I guess there could be other users of the MI generated by the
parser where this might indeed make a difference ...   [ As an aside, is
this large amount of duplicated info really necessary in the first place?
There ought to be a better way ...  ]   Any thoughts on how to handle this
welcome!

For now, my next steps towards getting the asm parser completed will be:

- Complete the back-end .td files cleanups (e.g. support for conditional
branches) and get them upstream.

- Further discuss the TableGen common code changes, and in particular the
ARM issuses mentioned above, and either get the changes upstream or agree
on using back-end work-arounds.

- Complete missing features in the parser itself, add test cases, and run
large-scale tests (e.g. building all of test-suite using the asm-parser as
assembler).

Anything else I'm missing?

Bye,
Ulrich

(See attached file: diff-llvm-asm-address)(See attached file:
diff-llvm-asm-aliases)(See attached file: diff-llvm-asm-branches)(See
attached file: diff-llvm-asm-calls)(See attached file: diff-llvm-asm-hacks)
(See attached file: diff-llvm-asm-parser)(See attached file:
diff-llvm-asm-preinc)(See attached file: diff-llvm-asm-variants)(See
attached file: diff-llvm-tblgen-regclass)(See attached file:
diff-llvm-tblgen-tie)(See attached file: series)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: diff-llvm-asm-address
Type: application/octet-stream
Size: 31477 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130118/61d9a2ea/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: diff-llvm-asm-aliases
Type: application/octet-stream
Size: 3389 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130118/61d9a2ea/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: diff-llvm-asm-branches
Type: application/octet-stream
Size: 5015 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130118/61d9a2ea/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: diff-llvm-asm-calls
Type: application/octet-stream
Size: 20701 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130118/61d9a2ea/attachment-0003.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: diff-llvm-asm-hacks
Type: application/octet-stream
Size: 4830 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130118/61d9a2ea/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: diff-llvm-asm-parser
Type: application/octet-stream
Size: 21932 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130118/61d9a2ea/attachment-0005.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: diff-llvm-asm-preinc
Type: application/octet-stream
Size: 30360 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130118/61d9a2ea/attachment-0006.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: diff-llvm-asm-variants
Type: application/octet-stream
Size: 1642 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130118/61d9a2ea/attachment-0007.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: diff-llvm-tblgen-regclass
Type: application/octet-stream
Size: 2588 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130118/61d9a2ea/attachment-0008.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: diff-llvm-tblgen-tie
Type: application/octet-stream
Size: 10667 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130118/61d9a2ea/attachment-0009.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: series
Type: application/octet-stream
Size: 219 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130118/61d9a2ea/attachment-0010.obj>