[PATCH] D101976: [OpenMP] Unified entry point for SPMD & generic kernels in the device RTL

Wed May 12 12:16:35 PDT 2021

JonChesterfield accepted this revision.
JonChesterfield added a comment.
This revision is now accepted and ready to land.

I'm not certain what this 'aligned' limitation for nvptx syncthreads is, but can't think of a corresponding one for amdgcn. So we may not need the LDS barrier construction, and it'll be much faster if we don't.

This was reported working on amdgpu by a third party against an earlier trunk build, but sadly the current trunk seems to have regressed (debugging offline). So I have no reason to believe this doesn't work, and some reason to believe it will do. Objection withdrawn.

The code itself always looked fine, was only nervous about the changes to concurrency primitives in nvptx.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D101976/new/

https://reviews.llvm.org/D101976