[llvm] [ThinLTO] Properly support targets that require importing all external functions (PR #133588)
Shilei Tian via llvm-commits
llvm-commits at lists.llvm.org
Sat Mar 29 11:04:11 PDT 2025
shiltian wrote:
> Why is there such a limitation?
>From what I understand, seeing all device code is necessary for resource usage analysis, which is required to compute occupancy. Occupancy determines the maximum number of threads a thread block can have. At runtime, when a kernel is launched, the thread block size must not exceed the limit based on that occupancy.
> Would it be better to lift the limitation?
Ideally, yes, but in practice, probably not. Technically, we could support external function calls by conservatively assuming the callee uses all possible resources. But that would severely degrade performance. Since GPUs are primarily used for performance workloads, this kind of support may not be practically useful.
https://github.com/llvm/llvm-project/pull/133588
More information about the llvm-commits
mailing list