[PATCH] D152706: [AMDGPU] Use SSAUpdater in PromoteAlloca

Mon Jun 19 05:46:51 PDT 2023

arsenm added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp:341-358
+  bool HasSeenBB = false;
+  BasicBlock *BB = I->getParent();
+  while (InsertIt != WorkList.end()) {
+    // -> Instruction is in the same basic block. If the new instruction comes
+    // before it, insert here.
+    //
+    // -> Instruction is in a different basic block: If we've already seen the
----------------
Pierre-vh wrote:
> arsenm wrote:
> > This whole thing feels wrong to me. You shouldn't need to figure out any ordering, much less by using comesBefore repeatedly
> I think that the issue was introduced because I'm looking through things such as bitcasts, which means that we can have IR like this:
> ```
> bb1:
>   %x = alloca ...
>   %cast.x = cast %x to ...
> 
> bb2: 
>   store ... %x
>   store ... %cast.x
> ```
> 
> Then, when I iterate the users of the alloca and build the worklist, the worklist may have the store to %cast.x before the store to %x, so I need to sort the worklist to use SSAAUpdater.
> 
> I'm not sure how `LoadStorePromoter` would help, I've looked at it but it seems to be for a different purpose
I applied your patch and not that many tests fail. The one I'm looking at is promote_memcpy_inline_aggr. This has the multiple loads and stores in the same block case the LoadAndStorePromoter comment says the base SSAUpdater does not handle. I think this is probably the correct tool, you just have the added complexity that we're coercing the type of the accesses to vector instead of reusing the original 

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152706/new/

https://reviews.llvm.org/D152706