[PATCH] D112725: [LoopVectorize] Extract the last lane from a uniform store

Thu Nov 4 04:40:42 PDT 2021

david-arm added inline comments.

================
Comment at: llvm/test/Transforms/LoopVectorize/X86/illegal-parallel-loop-uniform-write.ll:85
+; CHECK-NEXT:    [[TMP22:%.*]] = add nsw i32 [[TMP21]], 1
+; CHECK-NEXT:    store i32 [[TMP22]], i32* [[ARRAYIDX7_US]], align 4, !llvm.mem.parallel_loop_access !0
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
----------------
Hi @kmclaughlin, I don't think this looks right sadly. I wonder if now we're marking the store as uniform that you've exposed an existing bug in `collectLoopUniforms` or somewhere else like `handleReplication`? It looks like we're also now treating the the load as uniform, which is wrong in this case because we still want to do the vector load and store out the last lane. I'd expected something like:

  [[WIDE_LOAD:%.*]] = load <4 x i32>, <4 x i32>* [[TMP22]], align 4
  [[TMP23:%.*]] = add nsw <4 x i32> [[WIDE_LOAD]], <i32 1, i32 1, i32 1, i32 1>
  [[TMP27:%.*]] = extractelement <4 x i32> [[TMP23]], i32 3
  store i32 [[TMP27]], i32* [[ARRAYIDX7_US]], align 4

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112725/new/

https://reviews.llvm.org/D112725