[llvm] [AArch64] Extend custom lowering for SVE types in `@llvm.experimental.vector.compress` (PR #105515)

Tue Aug 27 01:01:13 PDT 2024

================
@@ -6690,16 +6688,67 @@ SDValue AArch64TargetLowering::LowerVECTOR_COMPRESS(SDValue Op,
   EVT ContainerVT = getSVEContainerType(VecVT);
   EVT CastVT = VecVT.changeVectorElementTypeToInteger();
 
-  // Convert to i32 or i64 for smaller types, as these are the only supported
-  // sizes for compact.
-  if (ContainerVT != VecVT) {
-    Vec = DAG.getBitcast(CastVT, Vec);
-    Vec = DAG.getNode(ISD::ANY_EXTEND, DL, ContainerVT, Vec);
-  }
+  // These vector types aren't supported by the `compact` instruction, so
+  // we split and compact them as <vscale x 4 x i32>, store them on the stack,
+  // and then merge them again. In the other cases, emit compact directly.
+  SDValue Compressed;
+  if (VecVT == MVT::nxv8i16 || VecVT == MVT::nxv8i8 || VecVT == MVT::nxv16i8) {
+    SDValue Chain = DAG.getEntryNode();
+    SDValue StackPtr = DAG.CreateStackTemporary(
+        VecVT.getStoreSize(), DAG.getReducedAlign(VecVT, /*UseABI=*/false));
----------------
lawben wrote:

I believe we need the stack, as we concatenate two/four partial vectors with a runtime offset. I could not find any instruction that does this, as they usually require immediate offsets. Is there a specific ARM instruction that can do this that you are thinking of?  

https://github.com/llvm/llvm-project/pull/105515