[llvm-dev] [RFC] Heterogeneous LLVM-IR Modules

Tue Jul 28 12:28:10 PDT 2020

On 7/28/20 2:25 PM, Mehdi AMINI wrote:
 > On Tue, Jul 28, 2020 at 12:07 PM Johannes Doerfert <
 > johannesdoerfert at gmail.com> wrote:
 >
 >> [I removed all but the data layout question, that is an important topic]
 >> On 7/28/20 1:03 PM, Mehdi AMINI wrote:
 >>  > TL;DR
 >>  >> -----
 >>  >>
 >>  >> Let's allow to merge to LLVM-IR modules for different targets (with
 >>  >> compatible data layouts) into a single LLVM-IR module to facilitate
 >>  >> host-device code optimizations.
 >>  >>
 >>  >
 >>  > I think the main question I have is with respect to this limitation
 >> on the
 >>  > datalayout: isn't it too limiting in practice?
 >>  > I understand that this is much easier to implement in LLVM today, 
but it
 >>  > may get us into a fairly limited place in terms of what can be
 >> supported in
 >>  > the future.
 >>  > Have you looked into what would it take to have heterogeneous modules
 >> that
 >>  > have their own DL?
 >>
 >>
 >> Let me share some thoughts on the data layouts situation, not all of
 >> which are
 >> fully matured but I guess we have to start somewhere:
 >>
 >> If we look at the host-device interface there has to be some agreement
 >> on parts of the datalayout, namely what the data looks like the host
 >> sends over and expects back. If I'm not mistaken, GPUs will match the
 >> host in things like padding, endianness, etc. because you cannot
 >> translate things "on the fly". That said, here might be additional
 >> "address spaces" on either side that the other one is not matching/aware
 >> of. Long story short, I think host & device need to, and in practice do,
 >> agree on the data layout of the address space they use to communicate.
 >>
 >> The above is for me a strong hint that we could use address spaces to
 >> identify/distinguish differences when we link the modules. However,
 >> there might be the case that this is not sufficient, e.g., if the
 >> default alloca address space differs. In that case I don't see a reason
 >> to not pull the same "trick" as with the triple. We can specify
 >> additional data layouts, one per device, and if you retrieve the data
 >> layout, or triple, you need to pass a global symbol as a "anchor". For
 >> all intraprocedural passes this should be sufficient as they are only
 >> interested in the DL and triple of the function they look at. For IPOs
 >> we have to distinguish the ones that know about the host-device calls
 >> and the ones that don't. We might have to teach all of them about these
 >> calls but as long as they are callbacks through a driver routine I don't
 >> even think we need to.
 >>
 >> I'm curious if you or others see an immediate problem with both a device
 >> specific DL and triple (optionally) associated with every global symbol.
 >>
 >
 > Having a triple/DL per global symbols would likely solve everything, I
 > didn't get from your original email that this was considered.
 > If I understand correctly what you're describing, the DL on the Module
 > would be a "default" and we'd need to make the DL/triple APIs on the 
Module
 > "private" to force queries to go through an API on GlobalValue to get the
 > DL/triple?

That is what I tried to describe, yes. The "patch" I posted does this
"conceptually" for the triple. You make them private or require a global
value to be passed as part of the request, same result I guess. The key
is that the DL/triple is a property of the global symbol.

I'll respond to Renato's concerns on this as part of a response to him.

 >>
 >>
 >> ~ Johannes
 >>
 >>
 >