[LLVMdev] make DataLayout a mandatory part of Module

Wed Feb 5 09:45:02 PST 2014

On 1/31/14 5:23 PM, Nick Lewycky wrote:
> On 30 January 2014 09:55, Philip Reames <listmail at philipreames.com 
> <mailto:listmail at philipreames.com>> wrote:
>
>     On 1/29/14 3:40 PM, Nick Lewycky wrote:
>
>         The LLVM Module has an optional target triple and target
>         datalayout. Without them, an llvm::DataLayout can't be
>         constructed with meaningful data. The benefit to making them
>         optional is to permit optimization that would work across all
>         possible DataLayouts, then allow us to commit to a particular
>         one at a later point in time, thereby performing more
>         optimization in advance.
>
>         This feature is not being used. Instead, every user of LLVM IR
>         in a portability system defines one or more standardized
>         datalayouts for their platform, and shims to place calls with
>         the outside world. The primary reason for this is that
>         independence from DataLayout is not sufficient to achieve
>         portability because it doesn't also represent ABI lowering
>         constraints. If you have a system that attempts to use LLVM IR
>         in a portable fashion and does it without standardizing on a
>         datalayout, please share your experience.
>
>     Nick, I don't have a current system in place, but I do want to put
>     forward an alternate perspective.
>
>     We've been looking at doing late insertion of safepoints for
>     garbage collection.  One of the properties that we end up needing
>     to preserve through all the optimizations which precede our custom
>     rewriting phase is that the optimizer has not chosen to "hide"
>     pointers from us by using ptrtoint and integer math tricks.
>     Currently, we're simply running a verification pass before our
>     rewrite, but I'm very interested long term in constructing ways to
>     ensure a "gc safe" set of optimization passes.
>
>
> As a general rule passes need to support the whole of what the IR can 
> support. Trying to operate on a subset of IR seems like a losing 
> battle, unless you can show a mapping from one to the other (ie., 
> using code duplication to remove all unnatural loops from IR, or 
> collapsing a function to having a single exit node).
>
> What language were you planning to do this for? Does the language 
> permit the user to convert pointers to integers and vice versa? If so, 
> what do you do if the user program writes a pointer out to a file, 
> reads it back in later, and uses it?
Java - which does not permit arbitrary pointer manipulation.  (Well, 
without resorting to mechanism like JNI and sun.misc.Unsafe.  Doing so 
would be explicitly undefined behavior though.)  We also use raw pointer 
manipulations in our implementation (which is eventually inlined), but 
this happens after the safepoint insertion rewrite.

We strictly control the input IR.  As a result, I can insure that the 
initial IR meets our subset requirements.  In practice, all of the opto 
passes appear to preserve these invariants (i.e. not introducing 
inttoptr), but we'd like to justify that a bit more.
>
>     One of the ways I've been thinking about - but haven't actually
>     implemented yet - is to deny the optimization passes information
>     about pointer sizing.
>
>
> Right, pointer size (address space size) will become known to all 
> parts of the compiler. It's not even going to be just the 
> optimizations, ConstantExpr::get is going to grow smarter because of 
> this, as lib/Analysis/ConstantFolding.cpp merges into 
> lib/IR/ConstantFold.cpp. That is one of the major benefits that's 
> driving this. (All parts of the compiler will also know endian-ness, 
> which means we can constant fold loads, too.)
I would argue that all of the pieces you mentioned are performing 
optimizations.  :)  However, the exact semantics are unimportant for the 
overall discussion.
>
>     Under the assumption that an opto pass can't insert an ptrtoint
>     cast without knowing a safe integer size to use, this seems like
>     it would outlaw a class of optimizations we'd be broken by.
>
>
> Optimization passes generally prefer converting ptrtoint and inttoptr 
> to GEPs whenever possible.
This is good to hear and helps us.
> I expect that we'll end up with *fewer* ptr<->int conversions with 
> this change, because we'll know enough about the target to convert 
> them into GEPs.
Er, I'm confused by this.  Why would not knowing the size of a pointer 
case a GEP to be converted to a ptr <-> int conversion?

Or do you mean that after the change conversions in the original input 
IR are more likely to be recognized?
>
>     My understanding is that the only current way to do this would be
>     to not specify a DataLayout.  (And hack a few places with built in
>     assumptions.  Let's ignore that for the moment.)  With your
>     proposed change, would there be a clean way to express something
>     like this?
>
>
> I think your GC placement algorithm needs to handle inttoptr and 
> ptrtoint, whichever way this discussion goes. Sorry. I'd be happy to 
> hear others chime in -- I know I'm not an expert in this area or about 
> GCs -- but I don't find this rationale compelling.
The key assumption I didn't initially explain is that the initial IR 
couldn't contain conversions.  With that added, do you still see 
concerns?  I'm fairly sure I don't need to handle general ptr <-> int 
conversions.  If I'm wrong, I'd really like to know it.
>
>     p.s. From reading the mailing list a while back, I suspect that
>     the SPIR folks might have similar needs.  (i.e. hiding pointer
>     sizes, etc..)  Pure speculation on my part though.
>
>
> The SPIR spec specifies two target datalayouts, one for 32 bits and 
> one for 64 bits.
Good to know.  Thanks.
>
> Nick
>
Philip
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140205/e147adc5/attachment.html>