[PATCH] Improve performance of vector code on A15
Silviu Baranga
silbar01 at arm.com
Fri Mar 8 10:32:00 PST 2013
I've applied most of Tim's comments so I'm attaching the new version of
the patch. More tests would be good, but I still have to figure out a
meaningful way of writing them.
The widened load has been changed from a vldr to an adr + vld1 (all lanes)
sequence in order to avoid accessing invalid memory.
Thanks,
Silviu
> -----Original Message-----
> From: Tim Northover [mailto:t.p.northover at gmail.com]
> Sent: 07 March 2013 13:40
> To: Silviu Baranga
> Cc: Jakob Stoklund Olesen; James Molloy; Commit Messages and Patches
> for LLVM
> Subject: Re: [PATCH] Improve performance of vector code on A15
>
> Hi Silviu,
>
> I've taken a quick look at the patch. It's a relief not to have to
> think about implicit-defs! I mostly just spotted cosmetic things.
>
> + std::vector<unsigned> getReadDPRs(MachineInstr *MI);
>
> Have you considered a SmallVector here (and elideCopiesAndPHIs)? It
> could well be better here, just want to make sure you've thought about
> it.
>
> +// Returns true if this is a use of a SPR register.
>
> Doxygen-style comments on functions would be good.
>
> +bool A15SDOptimizer::usesSReg(MachineOperand &MO) {
> +bool A15SDOptimizer::usesDReg(MachineOperand &MO) {
> +bool A15SDOptimizer::usesQReg(MachineOperand &MO) {
>
> These all duplicate each other. It might be worth making them delegate
> to a function taking a TargetRegisterClass
>
> +unsigned A15SDOptimizer::widenConstantPoolLoad(MachineInstr *MI) {
>
> It's not always valid to do this with a VLDR.F64. If the constant
> happened to be just before a page-boundary into unallocated memory
> then it could fault. Fortunately, there appears to be a duplicating
> VLD1 instruction that you could use instead.
>
> + //regclass as DPRMI? (i.e. a DPR or QPR).
>
> Space.
>
> + // * INSERT_SUBREG: * If the SPR value was originally in another
> DPR/QPR
> + // lane, and the other lane(s) of the DPR/QPR
> register
> + // that we are inserting in are undefined, use
> the
> + // original DPR/QPR value.
>
> In monospace, the sublist is a little tricky to read. Might be worth
> putting a couple more spaces on subsequent lines.
>
> I still think it's worth doing a bit more thorough checking of the
> code sequences used in the tests: whether the registers are marshalled
> in a correct way for example.
>
> Cheers.
>
> Tim.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: a15-sd-preregalloc.diff
Type: application/octet-stream
Size: 25135 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130308/5e583f75/attachment.obj>
More information about the llvm-commits
mailing list