[Openmp-commits] [PATCH] D83271: [OpenMP] Replace function pointer uses in GPU state machine

Jon Chesterfield via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Mon Jul 6 18:35:36 PDT 2020


JonChesterfield added a comment.

That's interesting. Amdgpu does not handle function pointers well and I suspect nvptx has considerable performance overhead for them too. If a parallel region is only called from a single target region, it is always passed the same function pointer. Thus specialise the state machine. I think this machinery is equivalent to specialising the parallel region call.

The general case involves calling one parallel region runtime function with various different function pointers. Devirtualising that is fairly difficult. For another time.

For this simpler case, I think this transform is equivalent to specialising the various kmpc*parallel calls on a given function pointer. The callees are available when using a bitcode deviceRTL.

Iirc function specialisation / partial evaluation is one of the classic compiler optimisations that LLVM doesn't really do. It's difficult to define a good cost model and C exposes function pointer comparison. What we could implement for this is an attribute driven one, where we mark the function pointer arguments in the deviceRTL with such and use LTO. Avoid specialising a function whose address escapes.

I like this patch. It's a clear example of an effective openmp specific optimisation. It just happens to run very close to specialisation, which may not be that much harder to implement if we cheat on the cost model.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83271/new/

https://reviews.llvm.org/D83271





More information about the Openmp-commits mailing list