[PATCH] D117364: AMDGPU: Use module level register maximums for unknown callees

Mon Jan 17 10:50:58 PST 2022

sebastian-ne accepted this revision as: sebastian-ne.
sebastian-ne added inline comments.

================
Comment at: llvm/test/CodeGen/AMDGPU/amdpal-callable.ll:184
 ; GCN-NEXT:        .stack_frame_size_in_bytes: 0x10{{$}}
-; GCN-NEXT:        .vgpr_count:     0x29{{$}}
+; GCN-NEXT:        .vgpr_count:     0x40{{$}}
 ; GCN-NEXT:      no_stack_extern_call_many_args:
----------------
arsenm wrote:
> sebastian-ne wrote:
> > arsenm wrote:
> > > sebastian-ne wrote:
> > > > arsenm wrote:
> > > > > sebastian-ne wrote:
> > > > > > arsenm wrote:
> > > > > > > sebastian-ne wrote:
> > > > > > > > This is over-approximating the vgpr_count whenever an indirect call is involved, which is quite a performance hit.
> > > > > > > > 
> > > > > > > > Can we switch AMDGPUResourceUsageAnalysis to a ModulePass and run `propagateIndirectCallRegisterUsage` at the end, so that all functions with indirect calls will get the maximum VGPR count of all functions in the module?
> > > > > > > > (As opposed to max VGPR count of the SCC that is used currently, which I did not intend.)
> > > > > > > I'd like to move switching to a module pass into a follow up patch. I'm a bit afraid of unintended side effects by switching to a module pass. We're already paying a compile time cost by using SCC codegen, and module passes will be worse
> > > > > > If the over-approximation goes in, we’ll have to revert it in our graphics branch, otherwise any benchmarks or optimization work that is going on would be meaningless (we had to locally revert D103636 for that reason).
> > > > > > So, I’d prefer it if we could fix this in one push, if you think that’s possible.
> > > > > This is just completely broken as is, any benchmarks are working by accident
> > > > The current graphics use-case is not affected by this bug. We’re only compiling single functions per LLVM module and finding the maximum register usage and linking is done by the loader (PAL in this case).
> > > So then it also wouldn't be impacted by this change?
> > The compiled modules contain only a single function, but the function contains indirect functions calls. So it hits the code path here that uses `getMaxNumVGPRs(F)` to approximate the register usage.
> > The VGPR usage for the compiled function should just stay as it is, even if it contains an indirect call, because there is no other function (in the module) that could be called.
> > 
> > I tested this patch on a pipeline and the reported VGPR usage goes from 140 VGPRs to 256 VGPRs, noticeably reducing the occupancy.
> > 
> > As said, I’d //prefer// if this does not go in without an accompanying patch that makes the VGPR usage accurate again.
> > If you think that’s unreasonable, I’ll unblock this patch and revert it in our graphics branch until the follow up patch.
> I will push them both at the same time, but I'd like to defensively keep this and D117504 as separate commits in case something goes wrong
Thank you, I’ll have a look tomorrow.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D117364/new/

https://reviews.llvm.org/D117364