[clang] [llvm] [BPF] Do atomic_fetch_*() pattern matching with memory ordering (PR #107343)

Mon Sep 16 16:19:04 PDT 2024

================
@@ -152,22 +152,93 @@ static bool hasLiveDefs(const MachineInstr &MI, const TargetRegisterInfo *TRI) {
   return false;
 }
 
-void BPFMIPreEmitChecking::processAtomicInsts() {
+bool BPFMIPreEmitChecking::processAtomicInsts() {
+  if (!MF->getSubtarget<BPFSubtarget>().getHasJmp32()) {
+    // Only check for cpu version 1 and 2.
+    for (MachineBasicBlock &MBB : *MF) {
+      for (MachineInstr &MI : MBB) {
+        if (MI.getOpcode() != BPF::XADDW && MI.getOpcode() != BPF::XADDD)
+          continue;
+
+        LLVM_DEBUG(MI.dump());
+        if (hasLiveDefs(MI, TRI)) {
+          DebugLoc Empty;
+          const DebugLoc &DL = MI.getDebugLoc();
+          const Function &F = MF->getFunction();
+          F.getContext().diagnose(DiagnosticInfoUnsupported{
+              F, "Invalid usage of the XADD return value", DL});
+        }
+      }
+    }
+  }
+
+  // Check return values of atomic_fetch_and_{add,and,or,xor}.
----------------
eddyz87 wrote:

Note, the comment is misleading, as in fact opposite action is taken `XADDD` is replaced with `XFADDD`.

As far as I understand, this pass is needed because of the following pattern:

```c
foreach P = [// and
             [atomic_load_and_i64_monotonic,  XANDD],
             // ...
            ] in {
  def : Pat<(P[0] ADDRri:$addr, GPR:$val), (P[1]  ADDRri:$addr, GPR:$val)>;
}
```

Where we do not distinguish between `atomic_load_and_i64_monotonic` that have their result used and those that don't. We always generate `XANDD` and fix it up later.

This seems ugly and I think I found a tablegen pattern allowing to remove this, e.g. as in 
[has-use-predicate.patch.txt](https://github.com/user-attachments/files/17020859/has-use-predicate.patch.txt). The main idea looks as follows:

```c
class binop_no_use<SDPatternOperator operator>
      : PatFrag<(ops node:$A, node:$B),
                (operator node:$A, node:$B),
                [{ return SDValue(N, 0).use_empty(); }]>;

class binop_has_use<SDPatternOperator operator>
      : PatFrag<(ops node:$A, node:$B),
                (operator node:$A, node:$B),
                [{ return !SDValue(N, 0).use_empty(); }]>;

foreach op = [add, and, or, xor] in {
def atomic_load_ # op # _i64_monotonic_nu:
    binop_no_use <!cast<SDPatternOperator>("atomic_load_"#op# _i64_monotonic)>;
def atomic_load_ # op # _i64_monotonic_hu:
    binop_has_use<!cast<SDPatternOperator>("atomic_load_"#op# _i64_monotonic)>;
}

foreach P = [// and
             [atomic_load_and_i64_monotonic_nu, XANDD],
             [atomic_load_and_i64_monotonic_hu, XFANDD],
             [atomic_load_and_i64_acquire,   XFANDD],
             ...
}
```

(it should be possible to extend attached patch to 32bit ops and to handle `add`).
Wdyt?


https://github.com/llvm/llvm-project/pull/107343