[Openmp-commits] [PATCH] D83271: [OpenMP] Replace function pointer uses in GPU state machine
Johannes Doerfert via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Tue Jul 7 00:12:56 PDT 2020
jdoerfert added a comment.
In D83271#2134746 <https://reviews.llvm.org/D83271#2134746>, @JonChesterfield wrote:
> That's interesting. Amdgpu does not handle function pointers well and I suspect nvptx has considerable performance overhead for them too. If a parallel region is only called from a single target region, it is always passed the same function pointer. Thus specialise the state machine. I think this machinery is equivalent to specialising the parallel region call.
The problem here was the spurious call edge from an unrelated kernel to the outlined parallel function. ptxas then needed more registers for a trivial kernel as it was "thought" to call the outlined function.
> The general case involves calling one parallel region runtime function with various different function pointers. Devirtualising that is fairly difficult. For another time.
>
> For this simpler case, I think this transform is equivalent to specialising the various kmpc*parallel calls on a given function pointer. The callees are available when using a bitcode deviceRTL.
>
> Iirc function specialisation / partial evaluation is one of the classic compiler optimisations that LLVM doesn't really do. It's difficult to define a good cost model and C exposes function pointer comparison. What we could implement for this is an attribute driven one, where we mark the function pointer arguments in the deviceRTL with such and use LTO. Avoid specialising a function whose address escapes.
>
> I like this patch. It's a clear example of an effective openmp specific optimisation. It just happens to run very close to specialisation, which may not be that much harder to implement if we cheat on the cost model.
Specialization is (soonish) coming to the Attributor ;)
----
================
Comment at: llvm/test/Transforms/OpenMP/gpu_state_machine_function_ptr_replacement.ll:278-280
+!llvm.module.flags = !{!0, !1, !2, !3}
+!omp_offload.info = !{!4}
+!nvvm.annotations = !{!5, !6, !7, !6, !8, !8, !8, !8, !9, !9, !8, !6, !7, !6, !8, !8, !8, !8, !9, !9, !8, !6, !7, !6, !8, !8, !8, !8, !9, !9, !8, !6, !7, !6, !8, !8, !8, !8, !9, !9, !8, !6, !7, !6, !8, !8, !8, !8, !9, !9, !8, !6, !7, !6, !8, !8, !8, !8, !9, !9, !8, !6, !7, !6, !8, !8, !8, !8, !9, !9, !8, !6, !7, !6, !8, !8, !8, !8, !9, !9, !8, !6, !7, !6, !8, !8, !8, !8, !9, !9, !8, !6, !7, !6, !8, !8, !8, !8, !9, !9, !8, !6, !7, !6, !8, !8, !8, !8, !9, !9, !8, !6, !7, !6, !8, !8, !8, !8, !9, !9, !8, !6, !7, !6, !8, !8, !8, !8, !9, !9, !8}
----------------
arsenm wrote:
> Mostly unneeded metadata?
Interestingly, that is what our device runtime looks like. For reasons I haven't understood yet it has all these "null is aligned" annotations. CUDA is weird.
Anyway, I can strip this down too.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D83271/new/
https://reviews.llvm.org/D83271
More information about the Openmp-commits
mailing list