[PATCH] D156040: [AMDGPU] Add dynamic stack bit info to kernel-resource-usage Rpass output for CoV5
Corbin Robeck via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sun Jul 23 10:37:01 PDT 2023
crobeck added a comment.
In D156040#4526036 <https://reviews.llvm.org/D156040#4526036>, @JonChesterfield wrote:
> In D156040#4525771 <https://reviews.llvm.org/D156040#4525771>, @crobeck wrote:
>
>> It is part of the kernel meta data that is passed to the runtime to indicate the compiler can't determine the required stack amount use of the kernel and to tell the runtime it needs to check the value set from hipDeviceSetLimit.
>
> I don't see how this conveys any information. The compiler writes the stack size to be allocated. If it doesn't know what is sufficient, it's going to request some maximum and hope for the best.
>
> The runtime allocates the requested size. If it has a bit saying "but use less if you know that's safe", then it can do nothing with that bit unless it has extra information. If it has that extra information, it doesn't need this bit.
>
> Therefore instead of adding printing stuff related to this Boolean flag, we should delete the Boolean flag.
>
> What's the use case I'm missing which makes this flag necessary/beneficial?
If the compiler knows the required stack size of a kernel, it sets it and reports that to the runtime. The runtime then uses that value.
If the compiler does not know the required amount, in the case of indirect function calls or recursion, it sets the dynamic stack use bit and reports the minimum required by the kernel to the runtime (the actual amount required by the kernel could be significantly more than the minimum)
If the value required by the kernel ends up being more than the minimum calculated by the compiler or the runtime default value, when the dynamic stack use bit is set, then the code will crash. hipDeviceSetLimit must then be used to raise the stack allocated by the compiler/runtime.
Identifying those cases where the kernels use dynamic stack, and thus developers need to consider the value set by the hipDeviceSetLimit API, can actually be somewhat difficult if the kernel is buried under many layers of templates. We're already adding remarks about scratch use and we've been asked if we can report this flag as part of these remarks.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D156040/new/
https://reviews.llvm.org/D156040
More information about the llvm-commits
mailing list