[PATCH] Improve performance of vector code on A15

Silviu Baranga silbar01 at arm.com
Fri Mar 8 10:32:00 PST 2013


I've applied most of Tim's comments so I'm attaching the new version of
the patch. More tests would be good, but I still have to figure out a
meaningful way of writing them.

The widened load has been changed from a vldr to an adr + vld1 (all lanes)
sequence in order to avoid accessing invalid memory.

Thanks,
Silviu

> -----Original Message-----
> From: Tim Northover [mailto:t.p.northover at gmail.com]
> Sent: 07 March 2013 13:40
> To: Silviu Baranga
> Cc: Jakob Stoklund Olesen; James Molloy; Commit Messages and Patches
> for LLVM
> Subject: Re: [PATCH] Improve performance of vector code on A15
> 
> Hi Silviu,
> 
> I've taken a quick look at the patch. It's a relief not to have to
> think about implicit-defs! I mostly just spotted cosmetic things.
> 
> +    std::vector<unsigned> getReadDPRs(MachineInstr *MI);
> 
> Have you considered a SmallVector here (and elideCopiesAndPHIs)? It
> could well be better here, just want to make sure you've thought about
> it.
> 
> +// Returns true if this is a use of a SPR register.
> 
> Doxygen-style comments on functions would be good.
> 
> +bool A15SDOptimizer::usesSReg(MachineOperand &MO) {
> +bool A15SDOptimizer::usesDReg(MachineOperand &MO) {
> +bool A15SDOptimizer::usesQReg(MachineOperand &MO) {
> 
> These all duplicate each other. It might be worth making them delegate
> to a function taking a TargetRegisterClass
> 
> +unsigned A15SDOptimizer::widenConstantPoolLoad(MachineInstr *MI) {
> 
> It's not always valid to do this with a VLDR.F64. If the constant
> happened to be just before a page-boundary into unallocated memory
> then it could fault. Fortunately, there appears to be a duplicating
> VLD1 instruction that you could use instead.
> 
> +        //regclass as DPRMI? (i.e. a DPR or QPR).
> 
> Space.
> 
> +  //   * INSERT_SUBREG: * If the SPR value was originally in another
> DPR/QPR
> +  //                    lane, and the other lane(s) of the DPR/QPR
> register
> +  //                    that we are inserting in are undefined, use
> the
> +  //                    original DPR/QPR value.
> 
> In monospace, the sublist is a little tricky to read. Might be worth
> putting a couple more spaces on subsequent lines.
> 
> I still think it's worth doing a bit more thorough checking of the
> code sequences used in the tests: whether the registers are marshalled
> in a correct way for example.
> 
> Cheers.
> 
> Tim.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: a15-sd-preregalloc.diff
Type: application/octet-stream
Size: 25135 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130308/5e583f75/attachment.obj>


More information about the llvm-commits mailing list