[Openmp-commits] [PATCH] D138389: [OpenMP][libomptarget] Add AMDGPU NextGen plugin with asynchronous behavior

Tue Nov 29 15:37:20 PST 2022

JonChesterfield added a subscriber: pdhaliwal.
JonChesterfield added a comment.

In D138389#3954160 <https://reviews.llvm.org/D138389#3954160>, @jdoerfert wrote:

> 1. Intel and AMD not upstreaming their tests is not our problem per se. I'd love for them to do it but they would also need to upstreaming/using the functionality in their plugins. They had 2+ years to do that, "waiting" doesn't help. Maybe if we diverge more they finally have some incentive to work closer to/with upstream.

Yeah, likewise. There was a period where trunk, rocm and aomp all had exactly the same implementation for this plugin which was hard won. After the initial drop I mean, I spent a few weeks to pull them back into exact alignment and encouraged the internal devs to patch trunk instead. Didn't take. Today's status is the trunk plugin looks pretty much like the rocm one used to, but the rocm one has a bunch of stuff changed in it for a new code object ABI and some variant of asynchronous offloading. Further divergence seems unlikely to persuade the people who prefer working internally to stop doing that.

> 2. We should not apply two different sets of review rules only to make it easier for AMD. Your initial drop of the "old" plugin (https://reviews.llvm.org/D85742) has all the problems you listed above and more (including and lots of more lines with dead and untested code). Most of the things you argue are bad about this patch are literally states as the state of the "old plugin" in the commit message of D85742 <https://reviews.llvm.org/D85742>. Still, then the sentiment was to merge it "as-is and then improve it in-tree", with an explanation of why that is better.

Pretty much. The version today is smaller and saner than the initial drop but not refactored to a fixpoint by any stretch. This suffered from me handing it over to @pdhaliwal who did good work before moving on to greener pastures, and I haven't really picked it back up. I'm not clear why various AMD devs decided to patch the rocm version instead of upstream - gentle encouragement and declining to review things on the internal board hasn't really changed anything there.

One thing the initial drop of non-review-compliant and suspected buggy code did have going for it was that it passed a bunch of application testing when run alongside a broadly similar clang toolchain. So it didn't come with in tree tests to keep it known working but at least it started out OK. I would guess we can do similar here - land the new plugin without tests, and manually feed it through various levels of internal testing and report back what works. That'll probably suffice, painfully, for getting it to functional equivalence with the current plugin.

If there's a load of tests for the asynchronous behaviour that landed for cuda (which I didn't notice at the time) then we can xfail the ones that don't work on amdgpu and (ideally) fix those over time. I'd quite like the plugins to be unit tested, as opposed to running whole openmp programs through them, but am not sure what can be done in terms of infra for that. Feels like we might need a mock/stub GPU to tie it to, which perhaps works better with more of the plugin code shared than is currently the case.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138389/new/

https://reviews.llvm.org/D138389