[PATCH] D90935: [ARM][LowOverheadLoops] Merge VCMP and VPST across VPT blocks

Sam Tebbs via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Nov 12 06:00:15 PST 2020


samtebbs added inline comments.


================
Comment at: llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp:1513
 
-    if (VPTState::isEntryPredicatedOnVCTP(Block, /*exclusive*/true)) {
+    auto ReplaceVCMPWithVPT = [&](MachineInstr *At) {
+      assert(VCMP && "Replacing a removed or non-existent VCMP");
----------------
dmgreen wrote:
> It may be better if this takes VCMP as an argument - that would help keep the logic around here clearer where things are being used.
Sure!


================
Comment at: llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp:1563
         // with the VPST This should be the divergent instruction
-        MachineInstr *VCMP =
-            VCMPOpcodeToVPT(Divergent->getOpcode()) != 0 ? Divergent : nullptr;
-
-        auto ReplaceVCMPWithVPT = [&]() {
-          // Replace the VCMP with a VPT
-          MachineInstrBuilder MIB = BuildMI(
-              *Divergent->getParent(), Divergent, Divergent->getDebugLoc(),
-              TII->get(VCMPOpcodeToVPT(VCMP->getOpcode())));
-          MIB.addImm(ARMVCC::Then);
-          // Register one
-          MIB.add(VCMP->getOperand(1));
-          // Register two
-          MIB.add(VCMP->getOperand(2));
-          // The comparison code, e.g. ge, eq, lt
-          MIB.add(VCMP->getOperand(3));
-          LLVM_DEBUG(dbgs()
-                     << "ARM Loops: Combining with VCMP to VPT: " << *MIB);
-          LoLoop.BlockMasksToRecompute.insert(MIB.getInstr());
-          LoLoop.ToRemove.insert(VCMP);
-        };
+        VCMP = VCMPOpcodeToVPT(Divergent->getOpcode()) != 0 ? Divergent : VCMP;
 
----------------
dmgreen wrote:
> I feel like this should be using ` : nullptr`, even if just to keep it simpler. From what I can tell the older VCMP wouldn't ever be useful after the point that we have seen something that has altered the vpr predicate.
That's a good point!


================
Comment at: llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp:1600
+             I != E; ++I) {
+          if (getVPTInstrPredicate(*I) != ARMVCC::None) {
+            IntermediateInstrsUseVPR = true;
----------------
dmgreen wrote:
> Checking that an operand is vpr may be better.
👍 


================
Comment at: llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp:1618-1619
+      } else if (!NextIsPredicated) {
+        LLVM_DEBUG(dbgs() << "ARM Loops: Removing VPST: " << *VPST);
+        LoLoop.ToRemove.insert(VPST);
+      }
----------------
dmgreen wrote:
> I _think_ this case would be invalid input, and not something we would have to worry about (it could be an assert - ideally there would be code in the verifier that checked that VPT blocks were always valid, but no one has written that yet).
I added this when I was creating test cases but it makes sense that this wouldn't happen naturally.


================
Comment at: llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp:1564
         // with the VPST This should be the divergent instruction
-        MachineInstr *VCMP =
-            VCMPOpcodeToVPT(Divergent->getOpcode()) != 0 ? Divergent : nullptr;
-
-        auto ReplaceVCMPWithVPT = [&]() {
-          // Replace the VCMP with a VPT
-          MachineInstrBuilder MIB = BuildMI(
-              *Divergent->getParent(), Divergent, Divergent->getDebugLoc(),
-              TII->get(VCMPOpcodeToVPT(VCMP->getOpcode())));
-          MIB.addImm(ARMVCC::Then);
-          // Register one
-          MIB.add(VCMP->getOperand(1));
-          // Register two
-          MIB.add(VCMP->getOperand(2));
-          // The comparison code, e.g. ge, eq, lt
-          MIB.add(VCMP->getOperand(3));
-          LLVM_DEBUG(dbgs()
-                     << "ARM Loops: Combining with VCMP to VPT: " << *MIB);
-          LoLoop.BlockMasksToRecompute.insert(MIB.getInstr());
-          LoLoop.ToRemove.insert(VCMP);
-        };
+        VCMP = VCMPOpcodeToVPT(Divergent->getOpcode()) != 0 ? Divergent : VCMP;
 
----------------
dmgreen wrote:
> samtebbs wrote:
> > dmgreen wrote:
> > > samtebbs wrote:
> > > > dmgreen wrote:
> > > > > I think we need to make sure this _is_ the last instruction in the block. Otherwise we can't remove it.
> > > > Shouldn't it be removable as long as instructions after it in the block don't use the VCMP's output?
> > > The instructions after the VCMP in the vpt block will use the VCMP's predicate, as opposed to the one at the start of the block.
> > > So something like:
> > >   VPST
> > >   VCMP
> > >   VORR
> > >   ..
> > >   VPST
> > > 
> > > The VORR should be using the vpr from the VCMP (and so it can't be removed), and there's probably not a lot of point combining it into the VPST, as the predicate is already set.
> > > 
> > > I couldn't spot anything that check for that already.
> > Sure for that example the VCMP shouldn't be changed. But if the instructions after the VCMP don't use the vpr, then it can be removed right? That, rather than it being the last instruction, is the fundamental requirement I think you're getting at. This would be fine, for example:
> > 
> > 
> > ```
> > VPST
> > VCMP
> > ADD (scalar)
> > SUB (scalar)
> > VPST
> > ```
> Yep, that sounds fine. I meant last instruction in the VPT block, not last instruction in the basic block.
> 
> I guess put better - there cannot be any uses of vpr between the VCMP that i being removed and the VPST that is being converted into the VPT. 
Ah yes, I missed you specifying it being the last in the *vpt* block.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90935/new/

https://reviews.llvm.org/D90935



More information about the llvm-commits mailing list