[Openmp-dev] Multi arch deviceRTL status

Fri Dec 6 08:44:11 PST 2019

Hello OpenMP dev and AOMP team,

It's been a little while since sending an update on deviceRTL changes and
we've just passed having 50% of the code available under common. Therefore,
here is the state of play as I see it.

Design premise:

- Provide a single interface.h file declaring everything in the deviceRTL
library

- Write a thin abstraction layer over synchronisation, atomics,
architecture specific functions. This exists as target_impl.h, implemented
for nvptx and amdgcn

- Provide source under common/, written in terms of said abstraction, which
can optionally be used by targets

- Functions that are not drawn from common/ are implemented under target/

- Provide a test suite written against interface.h

Status:

- Interface implemented, looks OK. May want to reconsider how constants are
shared with the compiler

- Abstraction layer sufficient for most of the existing code. Needs an
atomic wrapper, some refactoring

- About half the sloc are under common. All used by nvptx, all will be used
by amdgcn once atomics are wrapped

- Some functions still missing from amdgcn in tree, all available in AOMP.
WIP

- Test suite is vapourware. I have undocumented plans

Next steps:

- Rename files that don't contain any cuda to use .cpp suffix

- Fill in last gaps in target_impl

- Build the testing infra

End goal:

- Demonstrably correct (unit tested!) openmp device runtime library

- Running on various nvptx and amdgcn gpus with minimal compiler
complexity. Now is a great time to join in as a third accelerator vendor

- Support for combining generic implementations under common with target
specialised versions

Thanks all,

Jon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/openmp-dev/attachments/20191206/e73c40bb/attachment.html>