[PATCH] D150969: [AArch64] Try to convert two XTN and two SMLSL to UZP1, SMLSL and SMLSL2

Dave Green via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Jun 5 01:04:43 PDT 2023


dmgreen added inline comments.


================
Comment at: llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp:727
+  MachineInstr *XTNUseMI = nullptr;
+  for (MachineInstr &MI : MBB) {
+    if (MI.getOpcode() == AArch64::XTNv4i16) {
----------------
This (and maybe the distance checks below) would make the algorithm O(N^2) in the number of instructions in the block.

It does allow the algorithm to be quite general - it can match any truncate with the UZP getting a free truncate for what may be an unrelated instruction. It may not always be valid though - Could the truncate depend on result of the UZP or vice-versa? It does have the advantage that it works with either SDAG or GlobalISel though.

>From what I have seen mull's often come in pairs. For example the code in smlsl_smlsl2_v4i32_uzp1 has:
```
; CHECK-NEXT:    uzp1 v2.8h, v2.8h, v3.8h
; CHECK-NEXT:    smlsl v1.4s, v0.4h, v2.4h
; CHECK-NEXT:    smlsl2 v1.4s, v0.8h, v2.8h
```
If it was processing the smlsl2, it might be able to look at the extract high of the first operand, see that it has 2 uses with the other being an smull(extractlow(.)), and use the other operand of the smull in the UZP instead of the undef when creating it in DAG? It has to check a number of things (and doesn't help with globalisel), but hopefully fits in as an extension to the existing code in SDAG.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150969/new/

https://reviews.llvm.org/D150969



More information about the llvm-commits mailing list