[Openmp-commits] [PATCH] D64217: [OpenMP][NFCI] Cleanup the target state queue implementation

Tue Dec 15 11:14:03 PST 2020

JonChesterfield added subscribers: pdhaliwal, ronl.
JonChesterfield added a comment.

target_impl.h is fairly extensive now. There's some debt remaining in how the atomics are handled but it's not causing much harm.

I suspect, but have not proven, that getting rid of volatile qualifiers causes problems for nvptx. Nvidia's atomic model is volatile + fences, which isn't brilliantly compatible with llvm's atomic model. I don't have complete faith in the ptx backend successfully translating atomic semantics into code that ptxas does the right thing with. I'm therefore nervous about changing away from volatile qualifying everything.

The state queue has some limitations. @ronl and @pdhaliwal have spent more looking at it than I have - iirc it reads out of bounds for stack frames above a certain size without diagnostics. The array indexed by smid() doesn't load balance as well for amdgcn as it does for nvptx.

My preference is to delete the state queue entirely. I think it is only used for nested parallelism, which is very slow on gpus whatever we do with it, but there's some semantic problem with just ignoring the nested pragmas. That probably means we can replace the linked stack frame allocated from this state_queue with a compiler transform.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64217/new/

https://reviews.llvm.org/D64217