[PATCH] D35319: LSE Atomics reorg - Part I

Wed Jul 26 11:11:21 PDT 2017

t.p.northover added a comment.

Thanks very much for updating the patch. It's starting to look a lot more like how I'd expect these to be implemented.

I think you've slightly missed the benefit I was suggesting multiclasses would give so there's still more duplication than is needed. Hopefully my explanations below make sense but I can try to clarify if not.

================
Comment at: include/llvm/Target/TargetSelectionDAG.td:1118
+
+defm atomic_load_add_8  : binary_atomic_op_ord<atomic_load_add>;
+defm atomic_load_add_16 : binary_atomic_op_ord<atomic_load_add>;
----------------
You can put these into the "multiclass binary_atomic_op"

    multiclass binary_atomic_op<...> {
      defm NAME#_8 : binary_atomic_op_ord;
      [...]
    }

That'll automatically instantiate the _8 ordered variants when the plain _8 is created. Obviously you need similar _16, _32, ...

================
Comment at: include/llvm/Target/TargetSelectionDAG.td:1168-1171
+defm atomic_load_add_8_monotonic  : binary_atomic_op_ord<atomic_load_add>;
+defm atomic_load_add_16_monotonic : binary_atomic_op_ord<atomic_load_add>;
+defm atomic_load_add_32_monotonic : binary_atomic_op_ord<atomic_load_add>;
+defm atomic_load_add_64_monotonic : binary_atomic_op_ord<atomic_load_add>;
----------------
Here onwards is unnecessary I believe. You're actually creating duplicate (and unused) nodes like `atomic_load_add_monotonic_seq_cst`, ...

The whole point of the multiclass is that you instantiate multiple variants at the same time. The multiclass takes the base name (e.g.`atomic_load_add_8`) and tacks on its named suffix (e.g. `_monotonic`) and hopefully implements the correct checks to make sure it matches the right node.

================
Comment at: lib/Target/AArch64/AArch64DeadRegisterDefinitionsPass.cpp:59
+
+  bool ShouldSkip(const MachineInstr &MI, const MachineFunction &MF) const;
 };
----------------
Tiny nit: functions should start with a lower-case letter.

================
Comment at: lib/Target/AArch64/AArch64InstrAtomics.td:410
 // v8.1 Atomic instructions:
-def : Pat<(atomic_load_add_8 GPR64:$Rn, GPR32:$Rs), (LDADDALb GPR32:$Rs, GPR64sp:$Rn)>;
-def : Pat<(atomic_load_add_16 GPR64:$Rn, GPR32:$Rs), (LDADDALh GPR32:$Rs, GPR64sp:$Rn)>;
-def : Pat<(atomic_load_add_32 GPR64:$Rn, GPR32:$Rs), (LDADDALs GPR32:$Rs, GPR64sp:$Rn)>;
-def : Pat<(atomic_load_add_64 GPR64:$Rn, GPR64:$Rs), (LDADDALd GPR64:$Rs, GPR64sp:$Rn)>;
-
-def : Pat<(atomic_load_or_8 GPR64:$Rn, GPR32:$Rs), (LDSETALb GPR32:$Rs, GPR64sp:$Rn)>;
-def : Pat<(atomic_load_or_16 GPR64:$Rn, GPR32:$Rs), (LDSETALh GPR32:$Rs, GPR64sp:$Rn)>;
-def : Pat<(atomic_load_or_32 GPR64:$Rn, GPR32:$Rs), (LDSETALs GPR32:$Rs, GPR64sp:$Rn)>;
-def : Pat<(atomic_load_or_64 GPR64:$Rn, GPR64:$Rs), (LDSETALd GPR64:$Rs, GPR64sp:$Rn)>;
-
-def : Pat<(atomic_load_xor_8 GPR64:$Rn, GPR32:$Rs), (LDEORALb GPR32:$Rs, GPR64sp:$Rn)>;
-def : Pat<(atomic_load_xor_16 GPR64:$Rn, GPR32:$Rs), (LDEORALh GPR32:$Rs, GPR64sp:$Rn)>;
-def : Pat<(atomic_load_xor_32 GPR64:$Rn, GPR32:$Rs), (LDEORALs GPR32:$Rs, GPR64sp:$Rn)>;
-def : Pat<(atomic_load_xor_64 GPR64:$Rn, GPR64:$Rs), (LDEORALd GPR64:$Rs, GPR64sp:$Rn)>;
-
-def : Pat<(atomic_load_max_8 GPR64:$Rn, GPR32:$Rs), (LDSMAXALb GPR32:$Rs, GPR64sp:$Rn)>;
-def : Pat<(atomic_load_max_16 GPR64:$Rn, GPR32:$Rs), (LDSMAXALh GPR32:$Rs, GPR64sp:$Rn)>;
-def : Pat<(atomic_load_max_32 GPR64:$Rn, GPR32:$Rs), (LDSMAXALs GPR32:$Rs, GPR64sp:$Rn)>;
-def : Pat<(atomic_load_max_64 GPR64:$Rn, GPR64:$Rs), (LDSMAXALd GPR64:$Rs, GPR64sp:$Rn)>;
-
-def : Pat<(atomic_load_umax_8 GPR64:$Rn, GPR32:$Rs), (LDUMAXALb GPR32:$Rs, GPR64sp:$Rn)>;
-def : Pat<(atomic_load_umax_16 GPR64:$Rn, GPR32:$Rs), (LDUMAXALh GPR32:$Rs, GPR64sp:$Rn)>;
-def : Pat<(atomic_load_umax_32 GPR64:$Rn, GPR32:$Rs), (LDUMAXALs GPR32:$Rs, GPR64sp:$Rn)>;
-def : Pat<(atomic_load_umax_64 GPR64:$Rn, GPR64:$Rs), (LDUMAXALd GPR64:$Rs, GPR64sp:$Rn)>;
-
-def : Pat<(atomic_load_min_8 GPR64:$Rn, GPR32:$Rs), (LDSMINALb GPR32:$Rs, GPR64sp:$Rn)>;
-def : Pat<(atomic_load_min_16 GPR64:$Rn, GPR32:$Rs), (LDSMINALh GPR32:$Rs, GPR64sp:$Rn)>;
-def : Pat<(atomic_load_min_32 GPR64:$Rn, GPR32:$Rs), (LDSMINALs GPR32:$Rs, GPR64sp:$Rn)>;
-def : Pat<(atomic_load_min_64 GPR64:$Rn, GPR64:$Rs), (LDSMINALd GPR64:$Rs, GPR64sp:$Rn)>;
-
-def : Pat<(atomic_load_umin_8 GPR64:$Rn, GPR32:$Rs), (LDUMINALb GPR32:$Rs, GPR64sp:$Rn)>;
-def : Pat<(atomic_load_umin_16 GPR64:$Rn, GPR32:$Rs), (LDUMINALh GPR32:$Rs, GPR64sp:$Rn)>;
-def : Pat<(atomic_load_umin_32 GPR64:$Rn, GPR32:$Rs), (LDUMINALs GPR32:$Rs, GPR64sp:$Rn)>;
-def : Pat<(atomic_load_umin_64 GPR64:$Rn, GPR64:$Rs), (LDUMINALd GPR64:$Rs, GPR64sp:$Rn)>;
-
-def : Pat<(atomic_cmp_swap_8 GPR64:$Rn, GPR32:$Rold, GPR32:$Rnew), (CASALb GPR32:$Rold, GPR32:$Rnew, GPR64sp:$Rn)>;
-def : Pat<(atomic_cmp_swap_16 GPR64:$Rn, GPR32:$Rold, GPR32:$Rnew), (CASALh GPR32:$Rold, GPR32:$Rnew, GPR64sp:$Rn)>;
-def : Pat<(atomic_cmp_swap_32 GPR64:$Rn, GPR32:$Rold, GPR32:$Rnew), (CASALs GPR32:$Rold, GPR32:$Rnew, GPR64sp:$Rn)>;
-def : Pat<(atomic_cmp_swap_64 GPR64:$Rn, GPR64:$Rold, GPR64:$Rnew), (CASALd GPR64:$Rold, GPR64:$Rnew, GPR64sp:$Rn)>;
-
-def : Pat<(atomic_swap_8 GPR64:$Rn, GPR32:$Rs), (SWPALb GPR32:$Rs, GPR64sp:$Rn)>;
-def : Pat<(atomic_swap_16 GPR64:$Rn, GPR32:$Rs), (SWPALh GPR32:$Rs, GPR64sp:$Rn)>;
-def : Pat<(atomic_swap_32 GPR64:$Rn, GPR32:$Rs), (SWPALs GPR32:$Rs, GPR64sp:$Rn)>;
-def : Pat<(atomic_swap_64 GPR64:$Rn, GPR64:$Rs), (SWPALd GPR64:$Rs, GPR64sp:$Rn)>;
-
-def : Pat<(atomic_load_sub_8 GPR64:$Rn, GPR32:$Rs), (LDADDALb (SUBWrr WZR, GPR32:$Rs), GPR64sp:$Rn)>;
-def : Pat<(atomic_load_sub_16 GPR64:$Rn, GPR32:$Rs), (LDADDALh (SUBWrr WZR, GPR32:$Rs), GPR64sp:$Rn)>;
-def : Pat<(atomic_load_sub_32 GPR64:$Rn, GPR32:$Rs), (LDADDALs (SUBWrr WZR, GPR32:$Rs), GPR64sp:$Rn)>;
-def : Pat<(atomic_load_sub_64 GPR64:$Rn, GPR64:$Rs), (LDADDALd (SUBXrr XZR, GPR64:$Rs), GPR64sp:$Rn)>;
-
-def : Pat<(atomic_load_and_8 GPR64:$Rn, GPR32:$Rs), (LDCLRALb (ORNWrr WZR, GPR32:$Rs), GPR64sp:$Rn)>;
-def : Pat<(atomic_load_and_16 GPR64:$Rn, GPR32:$Rs), (LDCLRALh (ORNWrr WZR, GPR32:$Rs), GPR64sp:$Rn)>;
-def : Pat<(atomic_load_and_32 GPR64:$Rn, GPR32:$Rs), (LDCLRALs (ORNWrr WZR, GPR32:$Rs), GPR64sp:$Rn)>;
-def : Pat<(atomic_load_and_64 GPR64:$Rn, GPR64:$Rs), (LDCLRALd (ORNXrr XZR, GPR64:$Rs), GPR64sp:$Rn)>;
+let Predicates = [HasLSE] in {
+  def : Pat<(atomic_load_add_8_monotonic GPR64:$Rn, GPR32:$Rs), (LDADDB GPR32:$Rs, GPR64sp:$Rn)>;
----------------
You can eliminate lots of copy/paste with multiclasses here too.

    // Differing SrcRHS and DstRHS allow you to cover CLR & SUB by giving a more
    // complex DAG for DstRHS.
    let Predicates = [HasLSE] in
    multiclass LDOPregister_patterns_ord_dag<string inst, string suffix, string op,
                                             string size, dag SrcRHS, dag DstRHS> {
      def : Pat<(!cast<SDNode>(op#"_"#size#"_monotonic") GPR64sp:$Rn, SrcRHS),
                (!cast<Instruction>(inst#suffix) DstRHS, GPR64sp:$Rn)>;
      [... variants for acquire, release, acq_rel and seq_cst ...]
    }

    // Simple case for non-CLR, non-SUB instructions where it's just one result inst.
    multiclass LDOPregister_patterns_ord<string inst, string suffix, string op,
                                             string size, dag RHS> {
      defm : LDOPregister_patterns_ord_dag<inst, suffix, op, size, RHS, RHS>;
    }

    multiclass LDOPregister_patterns<string inst, string op> {
      defm : LDOPregister_patterns_ord<inst, "d", op, "64", (i64 GPR64:$Rm)>;
      defm : LDOPregister_patterns_ord<inst, "s", op, "32", (i32 GPR32:$Rm)>;
      defm : LDOPregister_patterns_ord<inst, "h", op, "16", (i32 GPR32:$Rm)>;
      defm : LDOPregister_patterns_ord<inst, "b", op, "8",  (i32 GPR32:$Rm)>;
    }

    defm : LDOPregister_patterns<"LDADD", "atomic_load_add">;
    [... all other ops except SUB/CLR ...]

    // Then slightly more complex version of LDOPregister_patterns to handle CLR/SUB and a pair
    // of defms for them. About 8 lines of code.

================
Comment at: lib/Target/AArch64/AArch64InstrFormats.td:9401
 multiclass CompareAndSwap<bits<1> Acq, bits<1> Rel, string order> {
-  let Sz = 0b00, Acq = Acq, Rel = Rel in def b : BaseCAS<order, "b", GPR32>;
-  let Sz = 0b01, Acq = Acq, Rel = Rel in def h : BaseCAS<order, "h", GPR32>;
-  let Sz = 0b10, Acq = Acq, Rel = Rel in def s : BaseCAS<order, "", GPR32>;
-  let Sz = 0b11, Acq = Acq, Rel = Rel in def d : BaseCAS<order, "", GPR64>;
+  let Sz = 0b00, Acq = Acq, Rel = Rel in def B : BaseCAS<order, "b", GPR32>;
+  let Sz = 0b01, Acq = Acq, Rel = Rel in def H : BaseCAS<order, "h", GPR32>;
----------------
I'm fine with this change, but it should probably be committed before as a separate "NFC" rename. In fact, feel free to do that any time you like.

================
Comment at: lib/Target/AArch64/AArch64InstrFormats.td:9517
+
+multiclass STOPregister_patterns_ord<string inst, string suffix,
+                                     string op, string size,
----------------
These `STOP` patterns appear to be unused.

================
Comment at: lib/Target/AArch64/AArch64SchedThunderX2T99.td:318-319

+// 8 cycles on LS0 or LS1 and I0, I1, or I2.
+def THX2T99Write_8Cyc_I012 : SchedWriteRes<[THX2T99LS01, THX2T99I012]> {
+  let Latency = 8;
----------------
Separate commit for the scheduling changes please. But as far as I'm concerned you can go ahead whenever you want, as with the renaming.

Repository:
  rL LLVM

https://reviews.llvm.org/D35319