[llvm] [AArch64] Improve codegen for some fixed-width partial reductions (PR #126529)
Sander de Smalen via llvm-commits
llvm-commits at lists.llvm.org
Thu Feb 13 02:05:22 PST 2025
================
@@ -26,6 +26,66 @@ define <4 x i32> @udot(<4 x i32> %acc, <16 x i8> %u, <16 x i8> %s) {
ret <4 x i32> %partial.reduce
}
+define <4 x i32> @udot_in_loop(ptr %p1, ptr %p2){
----------------
sdesmalen-arm wrote:
Why is the loop required here? From what I can see, `optimizeExtendOrTruncateConversion` is also called for blocks that are not in loops?
https://github.com/llvm/llvm-project/pull/126529
More information about the llvm-commits
mailing list