<div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

Short summary of the phone call we had Wednesday for people that<br>

couldn't make it, are not aware of the phone calls, or simply forgot:<br>

  - We do split the requested changes into three different activities,<br>

    thus separate patch sets:<br>

    1) splitting common logic and target specific code<br>

    2) improving documentation<br>

    3) unifying coding style<br>

  - We need to improve the testing situation...<br>

  - We need to agree on a directory structure and build model...<br></blockquote><div> </div><div>Great summary, thanks. Sounds good to me.<br></div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">libomptarget* directory structure<br>

=================================<br>

* everything currently contained in "openmp/libomptarget/"<br>

<br>

Folder and file names (including file extensions) are subject to the<br>

usual bikeshedding. Let's agree on the idea first and the color of the<br>

shed later.<br></blockquote><div><br></div><div>Yep. I've spent a couple of days staring at the differences between amdgcn and nvptx and have come up with a few use cases for the dir structure and build process.<br><br></div><div>- Totally device independent code. Can compile exactly the same thing for various targets. Basically not a problem, put in headers or src/common/foo.x. Hopefully as time goes on we'll have more of this.<br><br></div><div>- Necessarily device dependent. Access to architectural constants, performance counters, thread IDs or similar. To the extent they address common concerns, I'd like these to have a common interface. Or at least, common to hardware that has said concern. Functions like unsigned GetWarpId() perhaps. Declared in target_impl.h or equivalent.<br><br></div><div>- Optionally device dependent. The runtime is performance sensitive and inevitably the general purpose implementation of functions (in terms of the target_impl API) will be considered for specialisation. In the extreme case a vendor may want to implement everything from scratch and use nothing from common, but I suspect wanting to override particular parts will be the usual case.<br><br></div><div>- Vendor extensions. Extra functions, targeted from clang, that encapsulate some performance sensitive behaviour. Some will end up migrating into common, but a staging ground is likely to be helpful. Unlikely to be a problem in this context - it's just code that happens to only exist under one arch subdir.<br> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">For the device RTL...<br>

I now assume we will have a common core (probably header only), and<br>

device specific code that may or may not use common core parts. The<br>

structure I imagine looks like this:<br>

  openmp/libomptarget/deviceRTLs/common/include/{queue.h, target_impl.h,...}<br>

  openmp/libomptarget/deviceRTLs/nvptx/{<a href="http://libomptarget.cu" rel="noreferrer" target="_blank">libomptarget.cu</a>,CMakeLists.txt,...,include/target_impl.h,...}<br>

  openmp/libomptarget/deviceRTLs/amdgpu/{libomptarget.hip,CMakeLists.txt,...,include/target_impl.h,...}<br>

  openmp/libomptarget/deviceRTLs/CPU/{libomptarget.XYZ,CMakeLists.txt,...,include/target_impl.h,...}<br>

  openmp/libomptarget/deviceRTLs/.../{libomptarget.XYZ,CMakeLists.txt,...,include/target_impl.h,...}<br>

I added a CPU version above to ease testing (see below) and to provide a<br>

target_impl.h declaration for better IDE support.<br></blockquote><div> </div><div>Seems reasonable. CPU offload target is an interesting exercise that risks derailing this discussion a bit.<br></div> <br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

- The common files could be header only (debatable and not strictly necessary).<br></blockquote><div><div><br>I'm less confident that the common code benefits from being header only. Going with declarations in header + source files that get linked in wins a degree of encapsulation that I suspect will be beneficial here. Reversible decision either way.<br></div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

- All source files...<br>

- To make...<br>

    * put implementations in {nvptx,amdgpu,...}/include/target_impl.h.<br></blockquote><div><br>Likewise, doesn't seem to matter if the implementations are in the header or elsewhere under the architecture subdir.<br> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

    * have a `#include_next "target_impl.h"` in common/include/target_impl.h.<br>

    * have the common/include and one device include folder in the<br>

      include path (in this order) when we compile any file.<br></blockquote><div> </div><div>#include_next doesn't appear to be necessary. One can either give the implementation header a different name or let target specific implementations be in arch/target_impl.x.<br><br></div><div>I think your proposal (modulo the header-only discussion) covers the case of mostly common code that calls architecture specific leaf functions well. Notably that's sufficient for today's amdgcn target. I'd like to keep the option for architecture specific functions open.<br><br></div><div>On argument in favour of declarations in one shared header, #included by a source file that implements them for each architecture, is that it's about as simple a model as one can get while keeping the arch dependent code separate.I'd personally prefer we start there.<br><br></div><div>Thanks all,<br><br></div><div>Jon<br></div><div><br><br><br></div><div> </div></div></div></div></div></div>