[PATCH] Improve performance of vector code on A15

Tim Northover t.p.northover at gmail.com
Thu Feb 21 11:36:05 PST 2013


Hi James,

Nice work! I'm happy to see this pass reach mainline.

+  if (!definesSPR(MI, DefReg) || !definesDPROrQPRImplicitly(MI, ImpDefReg))
+    return false;

I think I'd be happier if there was an assert after this that the same
register is involved in both cases (MCRegisterInfo::isSubRegister, for
example). I can't think of any instructions that violate this at the
moment, but the constraint is a little implicit right now.

+  unsigned DefRegNum = DefReg - ARM::S0;

You should probably be using MCRegisterInfo::getEncodingValue here.
Subtracting an enum value is a little fragile. This pattern happens
elsewhere too.

+    ImpDefReg = TRI->getSubReg(ImpDefReg, ARM::dsub_0 + ((DefRegNum / 2) % 2));

Similarly, I'm not that keen on "ARM::dsub_0 + X". In this case there
are just two alternatives, aren't there?

+  MachineInstr *Dup = AddDefaultPred(BuildMI(*MI.getParent()->getParent(),
+                                             MI.getDebugLoc(),
+
TII->get(ARM::VDUPLN32d), ScratchReg)
+                                     .addReg(ImpDefReg,
RegState::Undef).addImm(DefRegLane))
+                                     .addReg(DefReg, RegState::Implicit);
+  MI.getParent()->insertAfter(MI, Dup);
+
+  // Make another vdup, this time breaking the dependency on the source lane.
+  // The vdup also re-defs the original original lane.
+  MachineInstrBuilder Dup2B =
AddDefaultPred(BuildMI(*MI.getParent()->getParent(),
+                                              MI.getDebugLoc(),
+
TII->get(ARM::VDUPLN32d), ImpDefReg)
+                                      .addReg(ImpDefReg).addImm(
DefRegLane ^ 1 ))
+                                      .addReg(DefReg,
RegState::Implicit | RegState::Define);

I'm a little worried about the different uses of ImpDefReg here.
Nothing in the first instruction would seem to define it, so shouldn't
the second use also be Undef?

In fact, the liveness of this sequence looks suspect in other ways:

Scratch<def> = VDUP ImpDef<undef>, #1; DefReg<imp-use>
ImpDef<def> = VDUP ImpDef, #1; DefReg<imp-def> (, PairedReg<imp-def>)
ImpDef<def> = VEXT ImpDef, Scratch

We really want someone like Jakob to correct us here (CCed), but:
+ Should ImpDef really always be "undef" on the first line? The
S-defining instruction could legitimately be marked ImpDef<imp-def> in
which case the live range has been broken. Perhaps later instructions
prevent harmful reordering, but perhaps they don't.
+ The second instruction doesn't seem to recognise that DefReg gets
read. Couldn't a scheduler move some other definition of DefReg
between the first two lines?
+ Is PairedReg really defined (and where it's expected) at the second
instruction? If not, the scheduler could put a use between 2 and 3.

>From my understanding, the sequence should be:

(mark defining instruction as ImpDef<imp-def> if it's not already)
Scratch<def> = VDUP ImpDef, #1; DefReg<imp-use> (, PairedReg<imp-use>)
ImpDef<def> = VDUP ImpDef, #1; DefReg<imp-use> (, PairedReg<imp-use>)
ImpDef<def> = VEXT ImpDef, Scratch; DefReg<imp-def> (, PairedReg<imp-def>)

The other sequences also seem a little iffy, but we can probably sort
them out together tomorrow if someone confirms I'm not off my rocker
in this most complex case.

One final point; I think the tests should be a little more thorough.
Certainly as far as tracking register dependencies, even if they can't
track the flags we set here properly.

Tim.



More information about the llvm-commits mailing list