[Openmp-dev] [RFC] Device runtime library (re)design

Mon Aug 5 14:16:35 PDT 2019

>
> Short summary of the phone call we had Wednesday for people that
> couldn't make it, are not aware of the phone calls, or simply forgot:
>   - We do split the requested changes into three different activities,
>     thus separate patch sets:
>     1) splitting common logic and target specific code
>     2) improving documentation
>     3) unifying coding style
>   - We need to improve the testing situation...
>   - We need to agree on a directory structure and build model...
>

Great summary, thanks. Sounds good to me.

> libomptarget* directory structure
> =================================
> * everything currently contained in "openmp/libomptarget/"
>
> Folder and file names (including file extensions) are subject to the
> usual bikeshedding. Let's agree on the idea first and the color of the
> shed later.
>

Yep. I've spent a couple of days staring at the differences between amdgcn
and nvptx and have come up with a few use cases for the dir structure and
build process.

- Totally device independent code. Can compile exactly the same thing for
various targets. Basically not a problem, put in headers or
src/common/foo.x. Hopefully as time goes on we'll have more of this.

- Necessarily device dependent. Access to architectural constants,
performance counters, thread IDs or similar. To the extent they address
common concerns, I'd like these to have a common interface. Or at least,
common to hardware that has said concern. Functions like unsigned
GetWarpId() perhaps. Declared in target_impl.h or equivalent.

- Optionally device dependent. The runtime is performance sensitive and
inevitably the general purpose implementation of functions (in terms of the
target_impl API) will be considered for specialisation. In the extreme case
a vendor may want to implement everything from scratch and use nothing from
common, but I suspect wanting to override particular parts will be the
usual case.

- Vendor extensions. Extra functions, targeted from clang, that encapsulate
some performance sensitive behaviour. Some will end up migrating into
common, but a staging ground is likely to be helpful. Unlikely to be a
problem in this context - it's just code that happens to only exist under
one arch subdir.

> For the device RTL...
> I now assume we will have a common core (probably header only), and
> device specific code that may or may not use common core parts. The
> structure I imagine looks like this:
>   openmp/libomptarget/deviceRTLs/common/include/{queue.h,
> target_impl.h,...}
>   openmp/libomptarget/deviceRTLs/nvptx/{libomptarget.cu
> ,CMakeLists.txt,...,include/target_impl.h,...}
>
> openmp/libomptarget/deviceRTLs/amdgpu/{libomptarget.hip,CMakeLists.txt,...,include/target_impl.h,...}
>
> openmp/libomptarget/deviceRTLs/CPU/{libomptarget.XYZ,CMakeLists.txt,...,include/target_impl.h,...}
>
> openmp/libomptarget/deviceRTLs/.../{libomptarget.XYZ,CMakeLists.txt,...,include/target_impl.h,...}
> I added a CPU version above to ease testing (see below) and to provide a
> target_impl.h declaration for better IDE support.
>

Seems reasonable. CPU offload target is an interesting exercise that risks
derailing this discussion a bit.

> - The common files could be header only (debatable and not strictly
> necessary).
>

I'm less confident that the common code benefits from being header only.
Going with declarations in header + source files that get linked in wins a
degree of encapsulation that I suspect will be beneficial here. Reversible
decision either way.

- All source files...
> - To make...
>     * put implementations in {nvptx,amdgpu,...}/include/target_impl.h.
>

Likewise, doesn't seem to matter if the implementations are in the header
or elsewhere under the architecture subdir.

>     * have a `#include_next "target_impl.h"` in
> common/include/target_impl.h.
>     * have the common/include and one device include folder in the
>       include path (in this order) when we compile any file.
>

#include_next doesn't appear to be necessary. One can either give the
implementation header a different name or let target specific
implementations be in arch/target_impl.x.

I think your proposal (modulo the header-only discussion) covers the case
of mostly common code that calls architecture specific leaf functions well.
Notably that's sufficient for today's amdgcn target. I'd like to keep the
option for architecture specific functions open.

On argument in favour of declarations in one shared header, #included by a
source file that implements them for each architecture, is that it's about
as simple a model as one can get while keeping the arch dependent code
separate.I'd personally prefer we start there.

Thanks all,

Jon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/openmp-dev/attachments/20190805/071b7fd3/attachment.html>