[PATCH] Improve performance of vector code on A15
Jakob Stoklund Olesen
stoklund at 2pi.dk
Thu Feb 21 17:04:12 PST 2013
On Feb 21, 2013, at 11:36 AM, Tim Northover <t.p.northover at gmail.com> wrote:
> From my understanding, the sequence should be:
>
> (mark defining instruction as ImpDef<imp-def> if it's not already)
> Scratch<def> = VDUP ImpDef, #1; DefReg<imp-use> (, PairedReg<imp-use>)
> ImpDef<def> = VDUP ImpDef, #1; DefReg<imp-use> (, PairedReg<imp-use>)
> ImpDef<def> = VEXT ImpDef, Scratch; DefReg<imp-def> (, PairedReg<imp-def>)
>
> The other sequences also seem a little iffy, but we can probably sort
> them out together tomorrow if someone confirms I'm not off my rocker
> in this most complex case.
This looks more plausible. Adding <undef> flags on uses makes the verifier shut up, but only by lying to it.
Passes like this are notoriously hard to get right, and it is likely to be a source of many future bugs related to sub-register liveness. It would be much easier to do this transformation before register allocation while the code is still in SSA form.
I would actually like to get rid of all the implicit operands that are currently added by the register allocator, and instead revert to a more conservative liveness model for post-RA passes. The extra complexity added by the implicit operands don't buy us a lot, and it is just too hard to work with the post-RA code as it looks now.
The conservative liveness model would only require that some part of a used register is live, the current model requires that the full register is live, which is why all the implicit operands are needed.
James, could this pass work as a preRegAlloc pass instead? Or could it even be part of isel?
X86 has the same problems with its 8-bit sub-registers, and we simply avoid generating code that uses 16-bit registers after writing the 8-bit sub-registers.
/jakob
More information about the llvm-commits
mailing list