[llvm] [PPC]Optimize zeroing accumulator and spilling instructions into simple instructions (PR #96094)

Mon Jun 24 20:26:52 PDT 2024

================
@@ -109,6 +109,93 @@ static bool hasPCRelativeForm(MachineInstr &Use) {
           MachineFunctionProperties::Property::NoVRegs);
     }
 
+    // The funtion will simply the zeroing accumulator and spilling instrcutions
+    // into simple xxlxor and spilling instrcuctions.
+    // From:
+    // setaccz acci
+    // xxmfacc acci
+    // stxv vsr(i*4+0), D(1)
+    // stxv vsr(i*4+1), D-16(1)
+    // stxv vsr(i*4+2), D-32(1)
+    // stxv vsr(i*4+3), D-48(1)
+
+    // To:
+    // xxlxor vsr(i*4), 0, 0
+    // stxv vsr(i*4), D(1)
+    // stxv vsr(i*4), D-16(1)
+    // stxv vsr(i*4), D-32(1)
+    // stxv vsr(i*4), D-48(1)
+    bool
+    OptimizeZeroingAccumulatorSpilling(MachineBasicBlock &MBB,
+                                       const TargetRegisterInfo *TRI) const {
+      bool changed = false;
+      for (auto BBI = MBB.instr_begin(); BBI != MBB.instr_end(); ++BBI) {
+        if (BBI->getOpcode() != PPC::XXSETACCZ)
+          continue;
+
+        Register ACCZReg = BBI->getOperand(0).getReg();
+
+        DenseSet<MachineInstr *> InstrsToErase;
+        InstrsToErase.insert(&*BBI++);
+
----------------
chenzheng1030 wrote:

This does not look good. We need a more generate handling for all cases. For now, this is targeted for a specific case in the comment. Compiler scheduling may change the order of these instructions.
- it is not necessary there must be 4 stores...
- limiting the case that `XXSETACCZ` and `XXMFACC` must be adjacent is not a good idea...
- limitting the case that the four stores must be followed `XXMFACC` is also not good.
- The loop structure also seems not able to handle case
```
    // setaccz acci
    // xxmfacc acci

    // setaccz acci2
    // xxmfacc acci2

    // stxv vsr(i*4+0), D(1)
    // stxv vsr(i*4+1), D-16(1)
    // stxv vsr(i*4+2), D-32(1)
    // stxv vsr(i*4+3), D-48(1)

    // stxv vsr(i2*4+0), D(2)
    // stxv vsr(i2*4+1), D-16(2)
    // stxv vsr(i2*4+2), D-32(2)
    // stxv vsr(i2*4+3), D-48(2)
```



https://github.com/llvm/llvm-project/pull/96094