[Openmp-dev] Multi arch deviceRTL status

Wed Dec 11 17:09:13 PST 2019

Hey,

The amdgcn deviceRTL needs a shim around atomics and a copy of libcall.cu
to be broadly functional. That seems minor.

There's some refactoring work going on in the aomp branch to reduce the
libraries it depends on. The nvptx/cuda openmp needs an entire second
toolchain installed. I don't want that to be true for amdgcn as well.

The hsa plugin is about 1200 lines total, already working, with a few
outstanding todos and stylistic improvements available. Ron is looking at
the todos at present. I'd be equally happy to iterate on that in tree -
it's not really code that can be used for other architectures so making it
beautiful isn't strictly necessary. It may also get reimplemented in terms
of a different underlying API at some point next year.

Aside from that... it's down to the clang/llvm support, and how much
customisation it takes to target nvptx & amdgcn from the same code path.
Hopefully the differences largely lie in the runtime. I need to pull down a
copy of your patches and see what needs to be tweaked to get a second gpu
target working.

Getting support in prior to the clang fork would make me happy. Up for
working pretty long days to hit that. After the Christmas party tomorrow at
least :)

One hazard - the runtime makes use of function pointers, which the llvm
amdgcn back end (i.e. llc) doesn't support. We inline very aggressively so
that mostly works out anyway, but there are a couple of places that route
the function pointer through memory (reduction iirc), and the aomp work
around for that is not pretty. I'm looking for better options.

Thanks!

Jon

On Thu, Dec 12, 2019 at 12:17 AM Doerfert, Johannes <jdoerfert at anl.gov>
wrote:

> I said it before but I say it again, thanks for your work on this!
>
> Without this rewrite we could not (reasonably) develop, maintain, and
> test our runtime library for more than 1 target. With it, I'm hopeful ;)
>
> The situation looks good already but I was hoping we get AMD support up
> and running before we fork of clang 10. So this part, and the TRegion
> part, need to get done this year (almost). Do you think that is feasible
> (for the AMD runtime and plugin)? (I'll ressurect TRegions, rebase them
> and move them into the OMPIRBuilder, so it will be mostly a question of
> fast reviews on that part.)
>
>
>
> On 12/06, Jon Chesterfield via Openmp-dev wrote:
> > Hello OpenMP dev and AOMP team,
> >
> > It's been a little while since sending an update on deviceRTL changes and
> > we've just passed having 50% of the code available under common.
> Therefore,
> > here is the state of play as I see it.
> >
> > Design premise:
> >
> > - Provide a single interface.h file declaring everything in the deviceRTL
> > library
> >
> > - Write a thin abstraction layer over synchronisation, atomics,
> > architecture specific functions. This exists as target_impl.h,
> implemented
> > for nvptx and amdgcn
> >
> > - Provide source under common/, written in terms of said abstraction,
> which
> > can optionally be used by targets
> >
> > - Functions that are not drawn from common/ are implemented under target/
> >
> > - Provide a test suite written against interface.h
> >
> > Status:
> >
> > - Interface implemented, looks OK. May want to reconsider how constants
> are
> > shared with the compiler
> >
> > - Abstraction layer sufficient for most of the existing code. Needs an
> > atomic wrapper, some refactoring
> >
> > - About half the sloc are under common. All used by nvptx, all will be
> used
> > by amdgcn once atomics are wrapped
> >
> > - Some functions still missing from amdgcn in tree, all available in
> AOMP.
> > WIP
> >
> > - Test suite is vapourware. I have undocumented plans
> >
> > Next steps:
> >
> > - Rename files that don't contain any cuda to use .cpp suffix
> >
> > - Fill in last gaps in target_impl
> >
> > - Build the testing infra
> >
> > End goal:
> >
> > - Demonstrably correct (unit tested!) openmp device runtime library
> >
> > - Running on various nvptx and amdgcn gpus with minimal compiler
> > complexity. Now is a great time to join in as a third accelerator vendor
> >
> > - Support for combining generic implementations under common with target
> > specialised versions
> >
> > Thanks all,
> >
> > Jon
>
> > _______________________________________________
> > Openmp-dev mailing list
> > Openmp-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
>
>
> --
>
> Johannes Doerfert
> Researcher
>
> Argonne National Laboratory
> Lemont, IL 60439, USA
>
> jdoerfert at anl.gov
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/openmp-dev/attachments/20191212/ae5ad992/attachment.html>