[RFC, PowerPC] Initial fast-isel support for PPC64

Mon Jun 24 09:19:59 PDT 2013

Here's my proposed initial implementation for fast instruction selection
on PowerPC.  This implementation is strictly for PPC64 ELF (no 32-bit
support, no Darwin support).

I used the ARM fast-isel support as a template for the PPC64 support.  I
found the existing code there very helpful, so kudos to Eric and others
for their work.

Although there's a lot of stuff here, there's a fair amount of work that
could still be done to catch more cases.  There's some TBD commentary at
the top of PPCFastISel.cpp indicating the major things that could be
added.  I'm not sure that I'll have time to work on those issues in the
near future as I am being pulled in several directions at once right
now, but hopefully I or someone else can get to them eventually.

Register classes cause some difficulty with the PowerPC implementation.
There are two main issues:

 - For 64-bit PowerPC, the GPRC register class consists of subregisters
of the G8RC register class.  32-bit operations produce the GPRC
registers, and 64-bit operations produce the G8RC registers.  However,
in reality the registers are 64 bits in length, so that a 32-bit
sign-extending load will propagate the sign into the upper 32 bits, for
example.  The DAG selection code is able to unify all this, but when
generating MIs directly this can create awkwardness.  In order to pass
machine verification, I created "_32_64" versions of several of the
32-bit instructions, which consume GPRC registers and produce G8RC
registers.

 - To be able to assign X0 (64-bit) or R0 (32-bit) where it's legal to
do so, each of G8RC and GPRC has a subclass excluding that register.
These are intersection subclasses named G8RC_and_G8RC_NOX0 and
GPRC_and_GPRC_NOR0, respectively.  These are needed because register 0
means the immediate value 0 in several instructions:  as the base
register of loads and stores; as the first operand in an add-immediate;
and in the integer select instruction (not yet implemented in
fast-isel).  Unfortunately the automatically generated portions of
fast-isel processing use GPRC and G8RC for 32-bit and 64-bit operands
regardless of this (first register class defined for the value type).
This requires some extra overrides of the emit routines.
   Related to this, we cannot in general assign result registers for an
instruction without thinking about the register-zero question.  Where
possible, I've chosen to use the currently assigned register for an
instruction to determine the register class for that instruction's
result.  So if an instruction has anticipated uses in the same block, we
know whether or not it can be assigned to R0; if not, we have to
conservatively assume that it can't be.
   It might be possible to just override everything so we never assign
register zero to anything for fast-isel code.  But I'm not sure this
would work well when we hit something we can't handle and punt to the
DAG selector.  I'm interested in thoughts from the more knowledgable on
this.

There are two things I've done in this patch that I consider to be
bloody hacks, and I am hopeful someone can suggest better alternatives:

 - Calling convention hack:  With this patch, PPCGenCallingConv.inc is
now included in two places:   PPCISelLowering.cpp and PPCFastISel.cpp.
Because we don't use the 32-bit support, some functions defined in the
generated code are unused in PPCFastISel.cpp, causing build failures
with -Werror.  I also found it necessary to add some fast-isel specific
calling convention descriptions, which are not used in
PPCISelLowering.cpp, resulting in the same problem there.  For now, I've
added a dummy method in each file that creates artificial uses of the
unused functions to avoid the warnings.  There must be a better way...

 - In PPCFastISel::SelectFPToI(), I have another machine verification
issue where I need to convert an f32 (F4RC) to an f64 (F8RC).   This is
a no-op on PowerPC since both values occupy a 64-bit register, with f32
guaranteeing fewer bits of accuracy.  I looked at the debug output with
the DAG selector to see what I needed, and it showed a COPY from f32 to
f64.  However, when I tried to use that, something in the downstream
machinery quietly changed my copy into an f32->f32 copy, and I still
failed machine verification.  I circumvented this by using
COPY_TO_REGCLASS, which works; however, I believe that COPY_TO_REGCLASS
isn't expected to still exist once we've lowered to MIs.  At least,
PPCInstPrinter::printInst() was surprised and blew up on that opcode.  I
added a hack to skip a COPY_TO_REGCLASS during asm-printing since it's a
no-op, but there really must be a better way to handle this.

OK, enough prologue.  The patch is attached, and I would really
appreciate some good review before I commit it.  I have a lot of tests
that all pass with -verify-machineinstrs, but I am sure there are things
that I have missed, so the more eyes the better.

When committing, I can split out the changes to PPCInstPrinter.cpp and
PPCInstr{Info,64Bit}.td as separate patches, but everything else is
pretty strongly tied together.  If you have suggestions for committing
this in smaller pieces, I welcome those as well.

For those of you who got this far:  thanks for reading.  Definitely
qualifies as a tl;dr.  But hey, it's nothing compared with the patch. ;)

Thanks,
Bill
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fastisel-2013-06-24.patch
Type: text/x-patch
Size: 126510 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130624/0a85ce1d/attachment.bin>