[PATCH] D150969: [AArch64] Try to convert two XTN and two SMLSL to UZP1, SMLSL and SMLSL2

JinGu Kang via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Jun 5 02:06:05 PDT 2023


jaykang10 added inline comments.


================
Comment at: llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp:727
+  MachineInstr *XTNUseMI = nullptr;
+  for (MachineInstr &MI : MBB) {
+    if (MI.getOpcode() == AArch64::XTNv4i16) {
----------------
dmgreen wrote:
> This (and maybe the distance checks below) would make the algorithm O(N^2) in the number of instructions in the block.
> 
> It does allow the algorithm to be quite general - it can match any truncate with the UZP getting a free truncate for what may be an unrelated instruction. It may not always be valid though - Could the truncate depend on result of the UZP or vice-versa? It does have the advantage that it works with either SDAG or GlobalISel though.
> 
> From what I have seen mull's often come in pairs. For example the code in smlsl_smlsl2_v4i32_uzp1 has:
> ```
> ; CHECK-NEXT:    uzp1 v2.8h, v2.8h, v3.8h
> ; CHECK-NEXT:    smlsl v1.4s, v0.4h, v2.4h
> ; CHECK-NEXT:    smlsl2 v1.4s, v0.8h, v2.8h
> ```
> If it was processing the smlsl2, it might be able to look at the extract high of the first operand, see that it has 2 uses with the other being an smull(extractlow(.)), and use the other operand of the smull in the UZP instead of the undef when creating it in DAG? It has to check a number of things (and doesn't help with globalisel), but hopefully fits in as an extension to the existing code in SDAG.
um... as you can see on my first patch in this review, I checked the `smlsl2`'s first operand's is `smlsl` in MIPeephole opt.
@efriedma pointed out the patch fixes the specific case and suggested to generalize the case. In order to generalize case, he suggested to split the issue into two cases so I tried to fix them in dagcombine and MIPeephole opt.
At this moment, AArch64 target does not have `smlsl` `smlsl2` custom node in DAG so we need to detect the node patterns such as `sub`, `add`, `SMULL` and `extract_subvector` from below dag.
```
Legalized selection DAG: %bb.0 'foo:entry'
SelectionDAG has 27 nodes:
  t0: ch,glue = EntryToken
  t2: v8i16,ch = CopyFromReg t0, Register:v8i16 %0
  t8: i64,ch = CopyFromReg t0, Register:i64 %3
  t11: v4i32,ch = load<(load (s128) from %ir.p, align 4)> t0, t8, undef:i64
    t14: i64 = add nuw t8, Constant:i64<16>
  t15: v4i32,ch = load<(load (s128) from %ir.arrayidx2, align 4)> t0, t14, undef:i64
      t27: ch = TokenFactor t11:1, t15:1
        t4: v4i32,ch = CopyFromReg t0, Register:v4i32 %1
            t18: v4i16 = extract_subvector t2, Constant:i64<0>
            t12: v4i16 = truncate t11
          t31: v4i32 = AArch64ISD::SMULL t18, t12
            t23: v4i16 = extract_subvector t2, Constant:i64<4>
            t16: v4i16 = truncate t15
          t30: v4i32 = AArch64ISD::SMULL t23, t16
        t25: v4i32 = add t31, t30
      t26: v4i32 = sub t4, t25
      t6: i64,ch = CopyFromReg t0, Register:i64 %2
    t28: ch = store<(store (s128) into %ir.out, !tbaa !6)> t27, t26, t6, undef:i64
  t29: ch = AArch64ISD::RET_GLUE t28
```
I am not sure which approach is better. @efriedma How do you think about it?
Anyway, let me try to implement it in dagcombine.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150969/new/

https://reviews.llvm.org/D150969



More information about the llvm-commits mailing list