[llvm-dev] [RFC] Heterogeneous LLVM-IR Modules
Johannes Doerfert via llvm-dev
llvm-dev at lists.llvm.org
Tue Jul 28 20:26:51 PDT 2020
On 7/28/20 6:13 PM, Renato Golin wrote:
> On Tue, 28 Jul 2020 at 21:52, Johannes Doerfert
> <johannesdoerfert at gmail.com> wrote:
>> Let's take OpenMP.
>> The compiler cannot know what your memory actually is because types are,
>> you know, just hints for the most part. So we need the devices to match
>> the host data layout wrt. padding, alignment, etc. or we could not copy
>> an array of structs from one to the other and expect it to work. CUDA,
>> HIP, SYCL, ... should all be the same. I hope someone corrects me if I
>> have some misconceptions here :)
> All those programming models have already been made to inter-work with
> CPUs like that. So, if we take the conscious decision that
> accelerators' drivers must implement that transparent layer in order
> to benefit from LLVM IR's multi-DL, fine.
>
> I have no stakes in any particular accelerator, but we should make it
> clear that they must implement that level of transparency to use this
> feature of LLVM IR.
Yes. Whatever we do, it should be clear what requirements there are
for you to create a multi-target module. We can probably even verify
some of them, like the direct call edge thing.
>> The "important" part is there is no direct call edge between the two
>> modules.
> Right! This makes it a lot simpler. We just need to annotate each
> global symbol with the right DL and trust that the lowering was done
> properly.
>
> What about optimisation passes? GPU code skips most of the CPU
> pipeline not to break codegen later on, but AFAIK, this is done by
> registering a new pass manager.
That is an interesting point. We could arguably teach the (new) PM to run
different pipelines for the different devices. FWIW, I'm not even sure
we do that right now, e.g., for CUDA compilation. [long live uniformity!]
> We'd need to teach passes (or the pass manager) to not throw
> accelerator code into the CPU pipeline and vice-versa.
What do you mean by accelerator code? Intrinsics, vector length,
etc. should be controlled by the triple, so that should be handled.
~ Johannes
>
> --renato
More information about the llvm-dev
mailing list