[llvm] [AArch64] Improve codegen for some fixed-width partial reductions (PR #126529)

Sander de Smalen via llvm-commits llvm-commits at lists.llvm.org
Thu Feb 13 03:46:23 PST 2025


================
@@ -26,6 +26,66 @@ define <4 x i32> @udot(<4 x i32> %acc, <16 x i8> %u, <16 x i8> %s) {
   ret <4 x i32> %partial.reduce
 }
 
+define <4 x i32> @udot_in_loop(ptr %p1, ptr %p2){
----------------
sdesmalen-arm wrote:

Just a question and not something I expect you to change in this patch, but should the code you're changing be done regardless of whether it's in a loop? The `smull(zext, sext)` case seems like it would always be a win. Perhaps the same is true for the partial.reduce case, even though to use an sdot/udot it may need to materialise a `splat(1)` vector, the constant should be cheap to materialize (and easy to CSE if there are multiple uses).

https://github.com/llvm/llvm-project/pull/126529


More information about the llvm-commits mailing list