[LLVMdev] make DataLayout a mandatory part of Module

Nick Lewycky nlewycky at google.com
Fri Jan 31 17:23:35 PST 2014

On 30 January 2014 09:55, Philip Reames <listmail at philipreames.com> wrote:

> On 1/29/14 3:40 PM, Nick Lewycky wrote:
>> The LLVM Module has an optional target triple and target datalayout.
>> Without them, an llvm::DataLayout can't be constructed with meaningful
>> data. The benefit to making them optional is to permit optimization that
>> would work across all possible DataLayouts, then allow us to commit to a
>> particular one at a later point in time, thereby performing more
>> optimization in advance.
>> This feature is not being used. Instead, every user of LLVM IR in a
>> portability system defines one or more standardized datalayouts for their
>> platform, and shims to place calls with the outside world. The primary
>> reason for this is that independence from DataLayout is not sufficient to
>> achieve portability because it doesn't also represent ABI lowering
>> constraints. If you have a system that attempts to use LLVM IR in a
>> portable fashion and does it without standardizing on a datalayout, please
>> share your experience.
> Nick, I don't have a current system in place, but I do want to put forward
> an alternate perspective.
> We've been looking at doing late insertion of safepoints for garbage
> collection.  One of the properties that we end up needing to preserve
> through all the optimizations which precede our custom rewriting phase is
> that the optimizer has not chosen to "hide" pointers from us by using
> ptrtoint and integer math tricks. Currently, we're simply running a
> verification pass before our rewrite, but I'm very interested long term in
> constructing ways to ensure a "gc safe" set of optimization passes.

As a general rule passes need to support the whole of what the IR can
support. Trying to operate on a subset of IR seems like a losing battle,
unless you can show a mapping from one to the other (ie., using code
duplication to remove all unnatural loops from IR, or collapsing a function
to having a single exit node).

What language were you planning to do this for? Does the language permit
the user to convert pointers to integers and vice versa? If so, what do you
do if the user program writes a pointer out to a file, reads it back in
later, and uses it?

One of the ways I've been thinking about - but haven't actually implemented
> yet - is to deny the optimization passes information about pointer sizing.

Right, pointer size (address space size) will become known to all parts of
the compiler. It's not even going to be just the optimizations,
ConstantExpr::get is going to grow smarter because of this, as
lib/Analysis/ConstantFolding.cpp merges into lib/IR/ConstantFold.cpp. That
is one of the major benefits that's driving this. (All parts of the
compiler will also know endian-ness, which means we can constant fold
loads, too.)

Under the assumption that an opto pass can't insert an ptrtoint cast
> without knowing a safe integer size to use, this seems like it would outlaw
> a class of optimizations we'd be broken by.

Optimization passes generally prefer converting ptrtoint and inttoptr to
GEPs whenever possible. I expect that we'll end up with *fewer* ptr<->int
conversions with this change, because we'll know enough about the target to
convert them into GEPs.

My understanding is that the only current way to do this would be to not
> specify a DataLayout.  (And hack a few places with built in assumptions.
>  Let's ignore that for the moment.)  With your proposed change, would there
> be a clean way to express something like this?

I think your GC placement algorithm needs to handle inttoptr and ptrtoint,
whichever way this discussion goes. Sorry. I'd be happy to hear others
chime in -- I know I'm not an expert in this area or about GCs -- but I
don't find this rationale compelling.

p.s. From reading the mailing list a while back, I suspect that the SPIR
> folks might have similar needs.  (i.e. hiding pointer sizes, etc..)  Pure
> speculation on my part though.

The SPIR spec specifies two target datalayouts, one for 32 bits and one for
64 bits.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140131/5572e2a4/attachment.html>

More information about the llvm-dev mailing list