[llvm] [AArch64][Codegen] Improve small shufflevector/concat lowering for SME (PR #116662)

Wed Nov 20 00:53:35 PST 2024

================
@@ -26161,6 +26238,8 @@ SDValue AArch64TargetLowering::PerformDAGCombine(SDNode *N,
 
     break;
   }
+  case AArch64ISD::ZIP1:
+    return performZIP1Combine(N, DAG);
----------------
sdesmalen-arm wrote:

For a test like `@concat_v4f16` in sve-streaming-mode-fixed-length-concat.ll, I think this should be a pattern that we match explicitly in `LowerFixedLengthVECTOR_SHUFFLEToSVE` rather than having to rely on a combine of ZIP1. Just before where it currently falls back on `GenerateFixedLengthSVETBL`, it tries a number of patterns such as `isZIP_v_undef_Mask`. It seems like some patterns there are missing to handle:

```
  t2: v4f16,ch = CopyFromReg t0, Register:v4f16 %0
  t4: v4f16,ch = CopyFromReg t0, Register:v4f16 %1
t19: v4f16 = vector_shuffle<0,1,4,5> t2, t4
```

https://github.com/llvm/llvm-project/pull/116662