[RFC, PowerPC] Initial fast-isel support for PPC64

Tue Jul 16 19:19:23 PDT 2013

Eric, all good comments -- thanks for looking over this "tiny" patch!

The fast-isel stuff is being forced a little lower on my priority list
right now, but as I have time I plan to fix the issues you've identified
and then start committing the work in smaller pieces.  I will ask for
re-review on those parts that are controversial in my mind.

I don't think there's much we can do to make the no-r0 stuff a lot
better (item f), at least without disturbing the common machinery for
all targets.  There are a number of places where generic fast-isel
assumes a single register class for a given type, even when the target
has multiple classes that can work.  It seems the DAG-isel code is able
to unify such register classes to make sure we never use R0 for a
register lifetime containing a use that forbids R0; but fast-isel is too
myopic for that, so we have to fix it up somehow.

I'll talk with Uli and Hal about the "8's" in the register classes and
instruction definitions (item g).  It would be a rather pervasive change
at this point, so we'll see how motivated we are...

All the other comments will be addressed one way or another.  I may need
help with item b; as you said, it probably indicates there's something
just wrong.  As I recall, it seemed to be another case of DAG-isel
"magic" that fast-isel doesn't handle so well (more register class
unification), but I hope we can find a better way than that hack.

Thanks again for the review!

Bill

On Fri, 2013-07-12 at 14:36 -0700, Eric Christopher wrote:
> Hi Bill,
> 
> Wow. Very large patch. I've done a glance through it and have a few
> odd cleanup comments, but it's also pretty obvious via the FIXMEs that
> you know what needs to be cleaned up as well. It'd be nice if you
> could split this up a bit more, I've got some questions below.
> 
> Testing question:
> 
> Have you tested with the test-suite that is in projects/test-suite?
> It's really good at finding bugs.
> 
> Few nits:
> 
> a) FORNOW->FIXME
> 
> b) This is worrisome:
> 
> +  // For fast-isel, a COPY_TO_REGCLASS may survive this long (necessary
> +  // to make machine instruction verification happy wrt register classes
> +  // for whatever reason).  FORNOW: Hack to avoid trying to print it
> +  // as it should have no effect.  FIXME: What is the real solution?
> +  if (MI->getOpcode() == TargetOpcode::COPY_TO_REGCLASS)
> +    return;
> 
> and, of course, the odd register class issues with extensions.
> 
> c) Indenting
> 
> +  typedef struct Address {
> +    enum {
> +      RegBase,
> +      FrameIndexBase
> 
> d) Block comments on top of functions.
> 
> e) No need for these {, I realize you copied them from ARM and I'll fix them.
> 
> +    case Instruction::BitCast: {
> 
> f) The noreg0 register class stuff is a bit icky, is there some way we
> can fix this?
> 
> g) Relatedly, the GR8 stuff for 64-bit registers is really confusing,
> meta thing for the port of maybe using 8, 16, 32, 64 for the
> associated things.
> 
> h) Document why you've only added calling conventions for fast isel.
> 
> i) Go ahead and remove this until you implement it.
> 
> +bool PPCFastISel::SelectIntrinsicCall(const IntrinsicInst &I) {
> +  // FIXME: Not yet implemented.
> +  return false;
> +}
> +
> +bool PPCFastISel::SelectSelect(const Instruction *I) {
> +  // FIXME: Not yet implemented.
> +  return false;
> +}
> 
> j) May as well separate out materializing a 64-bit integer as well :)
> 
> Thoughts?
> 
> -eric
> 
> 
> 
> 
> On Mon, Jun 24, 2013 at 9:19 AM, Bill Schmidt
> <wschmidt at linux.vnet.ibm.com> wrote:
> > Here's my proposed initial implementation for fast instruction selection
> > on PowerPC.  This implementation is strictly for PPC64 ELF (no 32-bit
> > support, no Darwin support).
> >
> > I used the ARM fast-isel support as a template for the PPC64 support.  I
> > found the existing code there very helpful, so kudos to Eric and others
> > for their work.
> >
> > Although there's a lot of stuff here, there's a fair amount of work that
> > could still be done to catch more cases.  There's some TBD commentary at
> > the top of PPCFastISel.cpp indicating the major things that could be
> > added.  I'm not sure that I'll have time to work on those issues in the
> > near future as I am being pulled in several directions at once right
> > now, but hopefully I or someone else can get to them eventually.
> >
> > Register classes cause some difficulty with the PowerPC implementation.
> > There are two main issues:
> >
> >  - For 64-bit PowerPC, the GPRC register class consists of subregisters
> > of the G8RC register class.  32-bit operations produce the GPRC
> > registers, and 64-bit operations produce the G8RC registers.  However,
> > in reality the registers are 64 bits in length, so that a 32-bit
> > sign-extending load will propagate the sign into the upper 32 bits, for
> > example.  The DAG selection code is able to unify all this, but when
> > generating MIs directly this can create awkwardness.  In order to pass
> > machine verification, I created "_32_64" versions of several of the
> > 32-bit instructions, which consume GPRC registers and produce G8RC
> > registers.
> >
> >  - To be able to assign X0 (64-bit) or R0 (32-bit) where it's legal to
> > do so, each of G8RC and GPRC has a subclass excluding that register.
> > These are intersection subclasses named G8RC_and_G8RC_NOX0 and
> > GPRC_and_GPRC_NOR0, respectively.  These are needed because register 0
> > means the immediate value 0 in several instructions:  as the base
> > register of loads and stores; as the first operand in an add-immediate;
> > and in the integer select instruction (not yet implemented in
> > fast-isel).  Unfortunately the automatically generated portions of
> > fast-isel processing use GPRC and G8RC for 32-bit and 64-bit operands
> > regardless of this (first register class defined for the value type).
> > This requires some extra overrides of the emit routines.
> >    Related to this, we cannot in general assign result registers for an
> > instruction without thinking about the register-zero question.  Where
> > possible, I've chosen to use the currently assigned register for an
> > instruction to determine the register class for that instruction's
> > result.  So if an instruction has anticipated uses in the same block, we
> > know whether or not it can be assigned to R0; if not, we have to
> > conservatively assume that it can't be.
> >    It might be possible to just override everything so we never assign
> > register zero to anything for fast-isel code.  But I'm not sure this
> > would work well when we hit something we can't handle and punt to the
> > DAG selector.  I'm interested in thoughts from the more knowledgable on
> > this.
> >
> > There are two things I've done in this patch that I consider to be
> > bloody hacks, and I am hopeful someone can suggest better alternatives:
> >
> >  - Calling convention hack:  With this patch, PPCGenCallingConv.inc is
> > now included in two places:   PPCISelLowering.cpp and PPCFastISel.cpp.
> > Because we don't use the 32-bit support, some functions defined in the
> > generated code are unused in PPCFastISel.cpp, causing build failures
> > with -Werror.  I also found it necessary to add some fast-isel specific
> > calling convention descriptions, which are not used in
> > PPCISelLowering.cpp, resulting in the same problem there.  For now, I've
> > added a dummy method in each file that creates artificial uses of the
> > unused functions to avoid the warnings.  There must be a better way...
> >
> >  - In PPCFastISel::SelectFPToI(), I have another machine verification
> > issue where I need to convert an f32 (F4RC) to an f64 (F8RC).   This is
> > a no-op on PowerPC since both values occupy a 64-bit register, with f32
> > guaranteeing fewer bits of accuracy.  I looked at the debug output with
> > the DAG selector to see what I needed, and it showed a COPY from f32 to
> > f64.  However, when I tried to use that, something in the downstream
> > machinery quietly changed my copy into an f32->f32 copy, and I still
> > failed machine verification.  I circumvented this by using
> > COPY_TO_REGCLASS, which works; however, I believe that COPY_TO_REGCLASS
> > isn't expected to still exist once we've lowered to MIs.  At least,
> > PPCInstPrinter::printInst() was surprised and blew up on that opcode.  I
> > added a hack to skip a COPY_TO_REGCLASS during asm-printing since it's a
> > no-op, but there really must be a better way to handle this.
> >
> > OK, enough prologue.  The patch is attached, and I would really
> > appreciate some good review before I commit it.  I have a lot of tests
> > that all pass with -verify-machineinstrs, but I am sure there are things
> > that I have missed, so the more eyes the better.
> >
> > When committing, I can split out the changes to PPCInstPrinter.cpp and
> > PPCInstr{Info,64Bit}.td as separate patches, but everything else is
> > pretty strongly tied together.  If you have suggestions for committing
> > this in smaller pieces, I welcome those as well.
> >
> > For those of you who got this far:  thanks for reading.  Definitely
> > qualifies as a tl;dr.  But hey, it's nothing compared with the patch. ;)
> >
> > Thanks,
> > Bill
>