[llvm] [AArch64] Add lowering for `@llvm.experimental.vector.compress` (PR #101015)

Mon Aug 12 06:53:23 PDT 2024

================
@@ -2412,11 +2412,61 @@ void DAGTypeLegalizer::SplitVecRes_VECTOR_COMPRESS(SDNode *N, SDValue &Lo,
                                                    SDValue &Hi) {
   // This is not "trivial", as there is a dependency between the two subvectors.
   // Depending on the number of 1s in the mask, the elements from the Hi vector
-  // need to be moved to the Lo vector. So we just perform this as one "big"
-  // operation and then extract the Lo and Hi vectors from that. This gets rid
-  // of VECTOR_COMPRESS and all other operands can be legalized later.
-  SDValue Compressed = TLI.expandVECTOR_COMPRESS(N, DAG);
-  std::tie(Lo, Hi) = DAG.SplitVector(Compressed, SDLoc(N));
+  // need to be moved to the Lo vector. Passthru values make this even harder.
+  // We try to use VECTOR_COMPRESS if the target has custom lowering with
+  // smaller types and passthru is undef, as it is most likely faster than the
+  // fully expand path. Otherwise, just do the full expansion as one "big"
+  // operation and then extract the Lo and Hi vectors from that. This gets
+  // rid of VECTOR_COMPRESS and all other operands can be legalized later.
+  SDLoc DL(N);
+  EVT VecVT = N->getValueType(0);
+
+  auto [LoVT, HiVT] = DAG.GetSplitDestVTs(VecVT);
+  bool HasCustomLowering = false;
+  EVT CheckVT = LoVT;
+  while (CheckVT.getVectorMinNumElements() > 1) {
+    if (TLI.isOperationCustom(ISD::VECTOR_COMPRESS, CheckVT)) {
----------------
lawben wrote:

Immediately after sending the reply I realized that this doesn't matter that much. The result type is only split until it reaches a legal type. So we never actually get into the case where we split this from `<8 x i8>` to `<4 x i8>`. So in any case we would need to duplicate the split + compress + merge logic in AArch64ISelLowering in a follow up PR. 

I'll still use the `isOperationLegal() || isOperationCustom()` approach though, becasue it is more general.

https://github.com/llvm/llvm-project/pull/101015