[PATCH] D112725: [LoopVectorize] Extract the last lane from a uniform store

David Sherwood via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Nov 4 04:40:42 PDT 2021


david-arm added inline comments.


================
Comment at: llvm/test/Transforms/LoopVectorize/X86/illegal-parallel-loop-uniform-write.ll:85
+; CHECK-NEXT:    [[TMP22:%.*]] = add nsw i32 [[TMP21]], 1
+; CHECK-NEXT:    store i32 [[TMP22]], i32* [[ARRAYIDX7_US]], align 4, !llvm.mem.parallel_loop_access !0
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
----------------
Hi @kmclaughlin, I don't think this looks right sadly. I wonder if now we're marking the store as uniform that you've exposed an existing bug in `collectLoopUniforms` or somewhere else like `handleReplication`? It looks like we're also now treating the the load as uniform, which is wrong in this case because we still want to do the vector load and store out the last lane. I'd expected something like:

  [[WIDE_LOAD:%.*]] = load <4 x i32>, <4 x i32>* [[TMP22]], align 4
  [[TMP23:%.*]] = add nsw <4 x i32> [[WIDE_LOAD]], <i32 1, i32 1, i32 1, i32 1>
  [[TMP27:%.*]] = extractelement <4 x i32> [[TMP23]], i32 3
  store i32 [[TMP27]], i32* [[ARRAYIDX7_US]], align 4



CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112725/new/

https://reviews.llvm.org/D112725



More information about the llvm-commits mailing list