[llvm] [AArch64][NEON][SVE] Lower mixed sign/zero extended partial reductions to usdot (PR #107566)

Wed Sep 11 09:12:25 PDT 2024

================
@@ -1420,6 +1421,9 @@ def USMMLA : SIMDThreeSameVectorMatMul<1, 0, "usmmla", int_aarch64_neon_usmmla>;
 defm USDOT : SIMDThreeSameVectorDot<0, 1, "usdot", int_aarch64_neon_usdot>;
 defm USDOTlane : SIMDThreeSameVectorDotIndex<0, 1, 0b10, "usdot", int_aarch64_neon_usdot>;
 
+def : Pat<(v4i32 (AArch64usdot (v4i32 V128:$Rd), (v16i8 V128:$Rm), (v16i8 V128:$Rn))), (USDOTv16i8 $Rd, $Rm, $Rn)>;
+def : Pat<(v2i32 (AArch64usdot (v2i32 V64:$Rd), (v8i8 V64:$Rm), (v8i8 V64:$Rn))), (USDOTv8i8 $Rd, $Rm, $Rn)>;
----------------
paulwalker-arm wrote:

You don't need extra patterns here but instead can pass `AArch64usdot` directly to `defm USDOT...` in place of the existing `int_aarch64_neon_usdot` parameter.

If you look at the way UDOT is handled you see the missing piece is a small update to `AArch64TargetLowering::LowerINTRINSIC_WO_CHAIN` to lower the intrinsics to `AArch64ISD::USDOT`.  Doing this will mean all future optimisations will apply equally to all places where the operation exists.

https://github.com/llvm/llvm-project/pull/107566