[PATCH] D72509: [ARM][LowOverheadLoops] Allow all MVE instrs.

Mon Jan 13 07:25:47 PST 2020

dmgreen added inline comments.

================
Comment at: llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp:525
+  if ((Flags & ARMII::DomainMask) != ARMII::DomainMVE)
+    return true;
+
----------------
samparker wrote:
> dmgreen wrote:
> > What if we had something like a `VMOV s0 s1`. I know that's not anything to do with this patch, but is that outlawed anywhere at the moment?
> It's not outlawed., why do we need to be concerned about it?
If we did `VMOV s7, s0; VMOV s6, s1; VMOV s5, s2; VMOV s4, s3`, that would be a reverse shuffle of a i32 vector. I would presume that would be trouble for tail prediction as it can no longer really be sure about what lanes are and are not predicated.

================
Comment at: llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp:530
+  // VPT block.
+  if ((Flags & ARMII::ValidForTailPredication) == 0 && !IsUse) {
+    LLVM_DEBUG(dbgs() << "ARM Loops: Can't tail predicate: " << *MI);
----------------
samparker wrote:
> dmgreen wrote:
> > What about VPNOT or VPSEL? They would use both the vpr and the tail predicate to different extents.
> > 
> > VPSEL we seem to mark as IsValidForTailPredication, which I'm not sure about. We need to make sure it's using the "same" predicate as before (and making sure it doesn't just use whatever we had last put there, if we delete the vcpt!)
> > 
> > 
> Good point. Yes, it looks like the problem is that we're only recording VPT blocks whereas, as you say, there are other instructions that also need handling. It looks like VPSEL/VPNOT would currently stuffed into the previous VPT block, even if they're not in one. This will mean that the predicates are checked for correctness, but it also could cause an assert too! I'll need to put up a different patch for that change.
Sounds good. I think they are both currently outside of IT blocks. We should be folding VPNOT into blocks where we can (creating else's), but are not yet. The same thing could be done for VPSEL in the future, turning it into a VMOVt.

================
Comment at: llvm/test/CodeGen/Thumb2/LowOverheadLoops/vmldava_in_vpt.mir:174
+  ; CHECK:   renamable $q3 = MVE_VMINu32 renamable $q2, renamable $q0, 0, $noreg, undef renamable $q3
+  ; CHECK:   renamable $r12 = MVE_VMLADAVas32 killed renamable $r12, killed renamable $q3, killed renamable $q2, 0, killed $noreg
+  ; CHECK:   $lr = MVE_LETP killed renamable $lr, %bb.2
----------------
samparker wrote:
> dmgreen wrote:
> > Another unrelated one, but there is nothing in the instructions of this loop that says that this is predicated on a hardware register, right? Or that is has unmodelled side effects? The refs to vpr and the vctp are removed but no other ref is added.
> Yes, at this point it's completely implicit, which I don't think is a problem currently. Maybe we could add a new register and a predicate def for LSTP and LETP, but I have no idea how invasive that would be for the predicate users.
I may be possible to add something similar to VPR, to the MVE_P base of all MVE instructions. Might be some differences in register orders though. Like you said, this is very late so might not be a problem currently. It's best not to be implicit if we can help it though.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72509/new/

https://reviews.llvm.org/D72509