[PATCH] D112725: [LoopVectorize] Extract the last lane from a uniform store
David Sherwood via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Nov 4 04:40:42 PDT 2021
david-arm added inline comments.
================
Comment at: llvm/test/Transforms/LoopVectorize/X86/illegal-parallel-loop-uniform-write.ll:85
+; CHECK-NEXT: [[TMP22:%.*]] = add nsw i32 [[TMP21]], 1
+; CHECK-NEXT: store i32 [[TMP22]], i32* [[ARRAYIDX7_US]], align 4, !llvm.mem.parallel_loop_access !0
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
----------------
Hi @kmclaughlin, I don't think this looks right sadly. I wonder if now we're marking the store as uniform that you've exposed an existing bug in `collectLoopUniforms` or somewhere else like `handleReplication`? It looks like we're also now treating the the load as uniform, which is wrong in this case because we still want to do the vector load and store out the last lane. I'd expected something like:
[[WIDE_LOAD:%.*]] = load <4 x i32>, <4 x i32>* [[TMP22]], align 4
[[TMP23:%.*]] = add nsw <4 x i32> [[WIDE_LOAD]], <i32 1, i32 1, i32 1, i32 1>
[[TMP27:%.*]] = extractelement <4 x i32> [[TMP23]], i32 3
store i32 [[TMP27]], i32* [[ARRAYIDX7_US]], align 4
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D112725/new/
https://reviews.llvm.org/D112725
More information about the llvm-commits
mailing list