<div dir="ltr"><div dir="ltr">Hey,<div><br></div><div>The amdgcn deviceRTL needs a shim around atomics and a copy of <a href="http://libcall.cu">libcall.cu</a> to be broadly functional. That seems minor.</div><div><br></div><div>There's some refactoring work going on in the aomp branch to reduce the libraries it depends on. The nvptx/cuda openmp needs an entire second toolchain installed. I don't want that to be true for amdgcn as well.</div><div><br></div><div>The hsa plugin is about 1200 lines total, already working, with a few outstanding todos and stylistic improvements available. Ron is looking at the todos at present. I'd be equally happy to iterate on that in tree - it's not really code that can be used for other architectures so making it beautiful isn't strictly necessary. It may also get reimplemented in terms of a different underlying API at some point next year.</div><div><br></div><div>Aside from that... it's down to the clang/llvm support, and how much customisation it takes to target nvptx & amdgcn from the same code path. Hopefully the differences largely lie in the runtime. I need to pull down a copy of your patches and see what needs to be tweaked to get a second gpu target working.</div><div><br></div><div>Getting support in prior to the clang fork would make me happy. Up for working pretty long days to hit that. After the Christmas party tomorrow at least :)</div><div><br></div><div>One hazard - the runtime makes use of function pointers, which the llvm amdgcn back end (i.e. llc) doesn't support. We inline very aggressively so that mostly works out anyway, but there are a couple of places that route the function pointer through memory (reduction iirc), and the aomp work around for that is not pretty. I'm looking for better options.</div><div><br></div><div>Thanks!</div><div><br></div><div>Jon</div><div><br></div></div><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Dec 12, 2019 at 12:17 AM Doerfert, Johannes <<a href="mailto:jdoerfert@anl.gov">jdoerfert@anl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">I said it before but I say it again, thanks for your work on this!<br>

<br>

Without this rewrite we could not (reasonably) develop, maintain, and<br>

test our runtime library for more than 1 target. With it, I'm hopeful ;)<br>

<br>

The situation looks good already but I was hoping we get AMD support up<br>

and running before we fork of clang 10. So this part, and the TRegion<br>

part, need to get done this year (almost). Do you think that is feasible<br>

(for the AMD runtime and plugin)? (I'll ressurect TRegions, rebase them<br>

and move them into the OMPIRBuilder, so it will be mostly a question of<br>

fast reviews on that part.)<br>

<br>

<br>

<br>

On 12/06, Jon Chesterfield via Openmp-dev wrote:<br>

> Hello OpenMP dev and AOMP team,<br>

> <br>

> It's been a little while since sending an update on deviceRTL changes and<br>

> we've just passed having 50% of the code available under common. Therefore,<br>

> here is the state of play as I see it.<br>

> <br>

> Design premise:<br>

> <br>

> - Provide a single interface.h file declaring everything in the deviceRTL<br>

> library<br>

> <br>

> - Write a thin abstraction layer over synchronisation, atomics,<br>

> architecture specific functions. This exists as target_impl.h, implemented<br>

> for nvptx and amdgcn<br>

> <br>

> - Provide source under common/, written in terms of said abstraction, which<br>

> can optionally be used by targets<br>

> <br>

> - Functions that are not drawn from common/ are implemented under target/<br>

> <br>

> - Provide a test suite written against interface.h<br>

> <br>

> Status:<br>

> <br>

> - Interface implemented, looks OK. May want to reconsider how constants are<br>

> shared with the compiler<br>

> <br>

> - Abstraction layer sufficient for most of the existing code. Needs an<br>

> atomic wrapper, some refactoring<br>

> <br>

> - About half the sloc are under common. All used by nvptx, all will be used<br>

> by amdgcn once atomics are wrapped<br>

> <br>

> - Some functions still missing from amdgcn in tree, all available in AOMP.<br>

> WIP<br>

> <br>

> - Test suite is vapourware. I have undocumented plans<br>

> <br>

> Next steps:<br>

> <br>

> - Rename files that don't contain any cuda to use .cpp suffix<br>

> <br>

> - Fill in last gaps in target_impl<br>

> <br>

> - Build the testing infra<br>

> <br>

> End goal:<br>

> <br>

> - Demonstrably correct (unit tested!) openmp device runtime library<br>

> <br>

> - Running on various nvptx and amdgcn gpus with minimal compiler<br>

> complexity. Now is a great time to join in as a third accelerator vendor<br>

> <br>

> - Support for combining generic implementations under common with target<br>

> specialised versions<br>

> <br>

> Thanks all,<br>

> <br>

> Jon<br>

<br>

> _______________________________________________<br>

> Openmp-dev mailing list<br>

> <a href="mailto:Openmp-dev@lists.llvm.org" target="_blank">Openmp-dev@lists.llvm.org</a><br>

> <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev</a><br>

<br>

<br>

-- <br>

<br>

Johannes Doerfert<br>

Researcher<br>

<br>

Argonne National Laboratory<br>

Lemont, IL 60439, USA<br>

<br>

<a href="mailto:jdoerfert@anl.gov" target="_blank">jdoerfert@anl.gov</a><br>

</blockquote></div></div>