[llvm] [NVPTX] Optimize v16i8 reductions (PR #67322)
Artem Belevich via llvm-commits
llvm-commits at lists.llvm.org
Mon Sep 25 13:30:04 PDT 2023
================
@@ -52,3 +52,129 @@ define float @ff(ptr %p) {
%sum = fadd float %sum3, %v4
ret float %sum
}
+
+define void @combine_v16i8(ptr noundef align 16 %ptr1, ptr noundef align 16 %ptr2) {
+ ; ENABLED-LABEL: combine_v16i8
+ ; ENABLED: ld.v4.u32
+ ; ENABLED: st.u32
----------------
Artem-B wrote:
That does not tell us that we've lowered things correctly. For all we know, the `st.u32` may be storing something completely different from the reduction results. Ideally we want to track how elements get extracted -- correct BFI or shift/mask.
https://github.com/llvm/llvm-project/pull/67322
More information about the llvm-commits
mailing list