[llvm-dev] [RFC] Heterogeneous LLVM-IR Modules

Tue Jul 28 16:13:14 PDT 2020

On Tue, 28 Jul 2020 at 21:52, Johannes Doerfert
<johannesdoerfert at gmail.com> wrote:
> Let's take OpenMP.
> The compiler cannot know what your memory actually is because types are,
> you know, just hints for the most part. So we need the devices to match
> the host data layout wrt. padding, alignment, etc. or we could not copy
> an array of structs from one to the other and expect it to work. CUDA,
> HIP, SYCL, ... should all be the same. I hope someone corrects me if I
> have some misconceptions here :)

All those programming models have already been made to inter-work with
CPUs like that. So, if we take the conscious decision that
accelerators' drivers must implement that transparent layer in order
to benefit from LLVM IR's multi-DL, fine.

I have no stakes in any particular accelerator, but we should make it
clear that they must implement that level of transparency to use this
feature of LLVM IR.

> The "important" part is there is no direct call edge between the two
> modules.

Right! This makes it a lot simpler. We just need to annotate each
global symbol with the right DL and trust that the lowering was done
properly.

What about optimisation passes? GPU code skips most of the CPU
pipeline not to break codegen later on, but AFAIK, this is done by
registering a new pass manager.

We'd need to teach passes (or the pass manager) to not throw
accelerator code into the CPU pipeline and vice-versa.

--renato