[PATCH] D119013: [ArgPromotion][AMDGPU] New MSSA-based function argument promotion pass with input/output argument support

Sun Feb 20 22:31:26 PST 2022

vpykhtin added a comment.

> From a cursory look at the implementation, does this handle unwinding properly?
>
> store i32 0, ptr %arg
> call void @may_unwind() readnone
> store i32 1, ptr %arg
>
> I believe promotion is not possible in this case, because there's no good way to provide the new value on the unwind path to the caller.

Thanks, I completely missed this part and its not working correctly now, I will fix that.

================
Comment at: llvm/lib/Transforms/IPO/MSSAArgPromotion.cpp:711
+void ArgumentPromoter::promoteInOutArg(ArgPromotionInfo &ArgInfo,
+                                       RetValuesMap &RetValues) {
+  SmallDenseMap<BasicBlock *, SmallVector<Instruction *, 4>, 16> MemInsts;
----------------
nikic wrote:
> vpykhtin wrote:
> > nikic wrote:
> > > For the actual promotion, have you considered making use of PromoteMemToReg? Basically, replace the old argument with an alloca that is stored on entry and read on exit, and then run mem2reg on that alloca?
> > > 
> > > ```
> > > define i32 @test(i32 %arg) {
> > >   %old.arg = alloca i32
> > >   store i32, i32* %old.arg
> > >   // Code uses old.arg
> > >   %ret = load i32, i32* %old.arg
> > >   ret i32 %ret
> > > }
> > > ```
> > I think this is a good idea to reuse the working code, however I would rather make some sort of refactoring on it than trying to please it :) There is similar code in LICM promoteLoopAccessesToScalars, so it seems we would benefit of such mem2reg framework.
> LICM uses the LoadAndStorePromoter utility instead -- I'm not really familiar with the differences between these approaches. From what I can tell, LoadAndStorePromoter is SSAUpdater-based and can work without DomTree, while PromoteMemToReg is more similar to the code you use here, in that it uses IDF.
Yes. you’re right PromoteMemToReg is good here and it does much more – handles intrinsics, debug info etc. It’s alloca centered but I think it’s relatively easy to make it more generic: separate the part which decides if the transformation is safe and the part that actually performs it. It uses LargeBlockInfo to keep track of instruction ordering which can be substituted with similar MSSA functionality if it’s present.

================
Comment at: llvm/test/Transforms/ArgumentPromotion/inoutargs2.ll:5
+; REF-LABEL: define internal void @no_ret_blocks() #0 {
+; REF-NEXT: unreachable
+define internal void @no_ret_blocks() #0 {
----------------
nikic wrote:
> What's up with these REF labels?
We have AMDGPURewriteOutArguments class which has similar functionality but works differently. @arsenm has tests for it and I put it here (modified in a way it suits my pass) so we could compare covered testcases. REFs are original checks from rewrite-out-arguments.ll so we could see the difference, maybe they should be removed later.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D119013/new/

https://reviews.llvm.org/D119013