[libc] [openmp] [llvm] [clang] [flang] [clang-tools-extra] [compiler-rt] [libcxx] [libcxxabi] [mlir] [AArch64] Add custom lowering for load <3 x i8>. (PR #78632)

Mon Jan 29 07:09:09 PST 2024

================
@@ -11012,6 +11012,50 @@ SDValue ReconstructShuffleWithRuntimeMask(SDValue Op, SelectionDAG &DAG) {
       MaskSourceVec);
 }
 
+// Check if Op is a BUILD_VECTOR with 2 extracts and a load that is cheaper to
+// insert into a vector and use a shuffle. This improves lowering for loads of
+// <3 x i8>.
+static SDValue shuffleWithSingleLoad(SDValue Op, SelectionDAG &DAG) {
+  if (Op.getNumOperands() != 4 || Op.getValueType() != MVT::v4i16)
+    return SDValue();
+
+  SDValue V0 = Op.getOperand(0);
+  SDValue V1 = Op.getOperand(1);
+  SDValue V2 = Op.getOperand(2);
+  SDValue V3 = Op.getOperand(3);
+  if (V0.getOpcode() != ISD::EXTRACT_VECTOR_ELT ||
----------------
TNorthover wrote:

This is a hyper-specific pattern. I assume it's because we are specifically looking for and only care about a single `<3 x i8>` instruction (a load?) and this is what it's been mangled to by the time we get to see it. If so we might have to tolerate the horror, but should at least call it out in comments.

https://github.com/llvm/llvm-project/pull/78632