[llvm-dev] [RFC] Heterogeneous LLVM-IR Modules

Johannes Doerfert via llvm-dev llvm-dev at lists.llvm.org
Tue Jul 28 12:05:52 PDT 2020


[I removed all but the data layout question, that is an important topic]
On 7/28/20 1:03 PM, Mehdi AMINI wrote:
 > TL;DR
 >> -----
 >>
 >> Let's allow to merge to LLVM-IR modules for different targets (with
 >> compatible data layouts) into a single LLVM-IR module to facilitate
 >> host-device code optimizations.
 >>
 >
 > I think the main question I have is with respect to this limitation 
on the
 > datalayout: isn't it too limiting in practice?
 > I understand that this is much easier to implement in LLVM today, but it
 > may get us into a fairly limited place in terms of what can be 
supported in
 > the future.
 > Have you looked into what would it take to have heterogeneous modules 
that
 > have their own DL?


Let me share some thoughts on the data layouts situation, not all of 
which are
fully matured but I guess we have to start somewhere:

If we look at the host-device interface there has to be some agreement
on parts of the datalayout, namely what the data looks like the host
sends over and expects back. If I'm not mistaken, GPUs will match the
host in things like padding, endianness, etc. because you cannot
translate things "on the fly". That said, here might be additional
"address spaces" on either side that the other one is not matching/aware
of. Long story short, I think host & device need to, and in practice do,
agree on the data layout of the address space they use to communicate.

The above is for me a strong hint that we could use address spaces to
identify/distinguish differences when we link the modules. However,
there might be the case that this is not sufficient, e.g., if the
default alloca address space differs. In that case I don't see a reason
to not pull the same "trick" as with the triple. We can specify
additional data layouts, one per device, and if you retrieve the data
layout, or triple, you need to pass a global symbol as a "anchor". For
all intraprocedural passes this should be sufficient as they are only
interested in the DL and triple of the function they look at. For IPOs
we have to distinguish the ones that know about the host-device calls
and the ones that don't. We might have to teach all of them about these
calls but as long as they are callbacks through a driver routine I don't
even think we need to.

I'm curious if you or others see an immediate problem with both a device
specific DL and triple (optionally) associated with every global symbol.


~ Johannes



More information about the llvm-dev mailing list