[PATCH] Improve performance of vector code on A15

Thu Feb 21 17:04:12 PST 2013

On Feb 21, 2013, at 11:36 AM, Tim Northover <t.p.northover at gmail.com> wrote:

> From my understanding, the sequence should be:
> 
> (mark defining instruction as ImpDef<imp-def> if it's not already)
> Scratch<def> = VDUP ImpDef, #1; DefReg<imp-use> (, PairedReg<imp-use>)
> ImpDef<def> = VDUP ImpDef, #1; DefReg<imp-use> (, PairedReg<imp-use>)
> ImpDef<def> = VEXT ImpDef, Scratch; DefReg<imp-def> (, PairedReg<imp-def>)
> 
> The other sequences also seem a little iffy, but we can probably sort
> them out together tomorrow if someone confirms I'm not off my rocker
> in this most complex case.

This looks more plausible. Adding <undef> flags on uses makes the verifier shut up, but only by lying to it.

Passes like this are notoriously hard to get right, and it is likely to be a source of many future bugs related to sub-register liveness. It would be much easier to do this transformation before register allocation while the code is still in SSA form.

I would actually like to get rid of all the implicit operands that are currently added by the register allocator, and instead revert to a more conservative liveness model for post-RA passes. The extra complexity added by the implicit operands don't buy us a lot, and it is just too hard to work with the post-RA code as it looks now.

The conservative liveness model would only require that some part of a used register is live, the current model requires that the full register is live, which is why all the implicit operands are needed.

James, could this pass work as a preRegAlloc pass instead? Or could it even be part of isel?

X86 has the same problems with its 8-bit sub-registers, and we simply avoid generating code that uses 16-bit registers after writing the 8-bit sub-registers.

/jakob