[PATCH] D157495: [Atomic-Expand] Run SimplifyCFG from Atomic-Expand on CAS loop blocks.

Wed Aug 23 17:05:43 PDT 2023

arsenm added inline comments.

================
Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:1524
       BB->splitBasicBlock(Builder.GetInsertPoint(), "atomicrmw.end");
+  CmpXchgLoopBlocks.push_back(ExitBB);
   BasicBlock *LoopBB = BasicBlock::Create(Ctx, "atomicrmw.start", F, ExitBB);
----------------
rovka wrote:
> pravinjagtap wrote:
> > rovka wrote:
> > > Why are we only keeping track of these blocks? There seem to be lots of other places in this file that split blocks and create new ones. Shouldn't we call simplifyCFG for all of them?
> > > Why are we only keeping track of these blocks? There seem to be lots of other places in this file that split blocks and create new ones. Shouldn't we call simplifyCFG for all of them?
> > 
> > Targets can configure this simplification using separate pass run e.g. Aarch64 is running simplifyCFG after atomic expand pass https://github.com/llvm/llvm-project/blob/57c090b2ea03937e7c6a08a594532788d01bb813/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp#L557
> > 
> > We think, having separate pass run will be expensive, therefore, for AMDGPU usecase, we are interested in running it on local changes done by atomic-expand. Do you think, calling `simplifyCFG` on entire functions makes much more sense ?
> > 
> > 
> > We think, having separate pass run will be expensive, therefore, for AMDGPU usecase, we are interested in running it on local changes done by atomic-expand. Do you think, calling simplifyCFG on entire functions makes much more sense ?
> 
> Ok, I don't know how much compile time this will save for AMDGPU, so I'll let the other reviewers comment on whether or not we want to teach this pass to clean up after itself. But if we decide that we do want it to clean up (i.e. run simplifyCFG only on the blocks that it has added), I think it should:
> 
> 1. be consistent about it. Right now it creates basic blocks in [[ https://github.com/llvm/llvm-project/blob/841c4dc7e51e0699a7054c89bd240ba33f7bf15e/llvm/lib/CodeGen/AtomicExpandPass.cpp#L973 | several ]] [[ https://github.com/llvm/llvm-project/blob/841c4dc7e51e0699a7054c89bd240ba33f7bf15e/llvm/lib/CodeGen/AtomicExpandPass.cpp#L1141 | different ]] [[ https://github.com/llvm/llvm-project/blob/841c4dc7e51e0699a7054c89bd240ba33f7bf15e/llvm/lib/CodeGen/AtomicExpandPass.cpp#L1287 | places ]], but with your patch it only cleans up some of them. If there's a good reason for this, it should be documented (at least in the commit message if not in the code). If there isn't, then at least leave some FIXMEs for the other cases, so people don't have to scratch their heads while looking through this code.
> 
> 2. be an opt-in behaviour, kind of like how the SimplifyCFG pass has all those settings you can fiddle with when adding it. AtomicExpand is used by several different backends, not just AArch64, and several of them add a full SimplifyCFG run after it (Arm, Hexagon). That SimplifyCFG run may serve to clean up both after AtomicExpand, but potentially also other passes that run before or in between, so it might not make sense to remove the SimplifyCFG run for them. In those cases, it will be useless for AtomicExpand to invoke its own piecemeal SimplifyCFG, so they should be able to run the "fast and messy" AtomicExpand if they want to.
> 
> That's just my 2 cents, maybe @arsenm or @foad have different opinions.
I don't think the compile time is uniquely expensive for AMDGPU, but I would assume just calling simplifycfg on modified blocks would be simpler (as atomics are rare) than running the full CFG pass after the fact

I think it's odd for simplifycfg to be in the codegen pipeline, so a more targeted application seems better

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157495/new/

https://reviews.llvm.org/D157495