[PATCH] Improve performance of vector code on A15

Silviu Baranga silbar01 at arm.com
Thu Mar 14 05:24:06 PDT 2013


I'm attaching a new version of the A15 neon optimization patch
with the following improvements:

- we now always check if a register is virtual before
calling getVRegDef.

- the elideCopiesAndPHIs method previously had a chance to
enter in an infinite recursion since it also looks past PHI nodes.
I've fixed that by using a DFS instead.

- disabled VMOVS widening on A15 since it was not working well with
the optimization pass.

I've also tested the patch using the LNT testsuite and didn't get
any failures (except the ones that we already know about and were
already failing without the patch).

Would this be close to a state where it can get committed?

- Silviu



> -----Original Message-----
> From: Silviu Baranga [mailto:silbar01 at arm.com]
> Sent: 12 March 2013 18:00
> To: 'Jakob Stoklund Olesen'
> Cc: Tim Northover; James Molloy; Commit Messages and Patches for LLVM
> Subject: RE: [PATCH] Improve performance of vector code on A15
> 
> Hi Jakob,
> 
> Probably with some careful use of VDUPLN(which would involve
> mostly selecting the best lane to insert the SPR into) the VDUPfdf
> pseudo could be replaced without anyone noticing. It's probably
> worth removing since it's a potential source for some very nasty
> bugs.
> 
> I removed the VDUPfdf pseudo usage from the patch to make it easier
> for anyone that is trying to fully remove the pseudo inst. I did notice
> some extra copies being inserted if the source lane was poorly chosen
> (always inserting the SPR in lane 0 before doing the VDUP). So it looks
> like the register coalescer won't do all the work.
> After doing some minor adjustments these went away, so it's not that
> bad.
> 
> I also sorted out the MachineInstr insertion call. What confused me was
> the fact that the following statement is true:
>   MBB.end()->getParent() != MBB
> which led me to the false conclusion that inserting at the end of the
> basic block was not supported when in fact the arguments were wrong.
> 
> I'm attaching the new patch.
> 
> - Silviu
> 
> > -----Original Message-----
> > From: Jakob Stoklund Olesen [mailto:stoklund at 2pi.dk]
> > Sent: 11 March 2013 18:42
> > To: Silviu Baranga
> > Cc: Tim Northover; James Molloy; Commit Messages and Patches for LLVM
> > Subject: Re: [PATCH] Improve performance of vector code on A15
> >
> >
> > On Mar 11, 2013, at 10:50 AM, Silviu Baranga <Silviu.Baranga at arm.com>
> > wrote:
> >
> > > I've also found a problem with the VDUPfdf/VUPfqf expansion which
> > caused
> > > the tests to not pass when adding -verify-machineinstrs. I'm
> > attaching
> > > a patch that should fix this. It has no test case since the test in
> > the
> > > other patch already relies on this change.
> >
> > The register coalescer has improved a lot since these two pseudo-
> > instructions were added. I wonder if they could be removed now?
> >
> > IIRC, James already did something along those lines for NEONvduplane
> by
> > using EXTRACT_SUBREG.
> >
> > /jakob
> >
> >
-------------- next part --------------
A non-text attachment was scrubbed...
Name: a15-sd-preregalloc.diff
Type: application/octet-stream
Size: 30369 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130314/7d8a05ad/attachment.obj>


More information about the llvm-commits mailing list