[LLVMdev] make DataLayout a mandatory part of Module
Philip Reames
listmail at philipreames.com
Fri Feb 14 10:59:38 PST 2014
Nick,
Thanks for writing up the summary of our conversation. I have a couple
of small clarifications to make, but I'm going to move that into a
separate thread since the discussion has largely devolved from the
original topic.
To repeat my comment from last week, I support your proposed change
w.r.t. DataLayout.
Philip
On 02/10/2014 05:25 PM, Nick Lewycky wrote:
> On 5 February 2014 09:45, Philip Reames <listmail at philipreames.com
> <mailto:listmail at philipreames.com>> wrote:
>
> On 1/31/14 5:23 PM, Nick Lewycky wrote:
>> On 30 January 2014 09:55, Philip Reames
>> <listmail at philipreames.com <mailto:listmail at philipreames.com>> wrote:
>>
>> On 1/29/14 3:40 PM, Nick Lewycky wrote:
>>
>> The LLVM Module has an optional target triple and target
>> datalayout. Without them, an llvm::DataLayout can't be
>> constructed with meaningful data. The benefit to making
>> them optional is to permit optimization that would work
>> across all possible DataLayouts, then allow us to commit
>> to a particular one at a later point in time, thereby
>> performing more optimization in advance.
>>
>> This feature is not being used. Instead, every user of
>> LLVM IR in a portability system defines one or more
>> standardized datalayouts for their platform, and shims to
>> place calls with the outside world. The primary reason
>> for this is that independence from DataLayout is not
>> sufficient to achieve portability because it doesn't also
>> represent ABI lowering constraints. If you have a system
>> that attempts to use LLVM IR in a portable fashion and
>> does it without standardizing on a datalayout, please
>> share your experience.
>>
>> Nick, I don't have a current system in place, but I do want
>> to put forward an alternate perspective.
>>
>> We've been looking at doing late insertion of safepoints for
>> garbage collection. One of the properties that we end up
>> needing to preserve through all the optimizations which
>> precede our custom rewriting phase is that the optimizer has
>> not chosen to "hide" pointers from us by using ptrtoint and
>> integer math tricks. Currently, we're simply running a
>> verification pass before our rewrite, but I'm very interested
>> long term in constructing ways to ensure a "gc safe" set of
>> optimization passes.
>>
>>
>> As a general rule passes need to support the whole of what the IR
>> can support. Trying to operate on a subset of IR seems like a
>> losing battle, unless you can show a mapping from one to the
>> other (ie., using code duplication to remove all unnatural loops
>> from IR, or collapsing a function to having a single exit node).
>>
>> What language were you planning to do this for? Does the language
>> permit the user to convert pointers to integers and vice versa?
>> If so, what do you do if the user program writes a pointer out to
>> a file, reads it back in later, and uses it?
> Java - which does not permit arbitrary pointer manipulation.
> (Well, without resorting to mechanism like JNI and
> sun.misc.Unsafe. Doing so would be explicitly undefined behavior
> though.) We also use raw pointer manipulations in our
> implementation (which is eventually inlined), but this happens
> after the safepoint insertion rewrite.
>
> We strictly control the input IR. As a result, I can insure that
> the initial IR meets our subset requirements. In practice, all of
> the opto passes appear to preserve these invariants (i.e. not
> introducing inttoptr), but we'd like to justify that a bit more.
>>
>> One of the ways I've been thinking about - but haven't
>> actually implemented yet - is to deny the optimization passes
>> information about pointer sizing.
>>
>>
>> Right, pointer size (address space size) will become known to all
>> parts of the compiler. It's not even going to be just the
>> optimizations, ConstantExpr::get is going to grow smarter because
>> of this, as lib/Analysis/ConstantFolding.cpp merges into
>> lib/IR/ConstantFold.cpp. That is one of the major benefits that's
>> driving this. (All parts of the compiler will also know
>> endian-ness, which means we can constant fold loads, too.)
> I would argue that all of the pieces you mentioned are performing
> optimizations. :) However, the exact semantics are unimportant
> for the overall discussion.
>>
>> Under the assumption that an opto pass can't insert an
>> ptrtoint cast without knowing a safe integer size to use,
>> this seems like it would outlaw a class of optimizations we'd
>> be broken by.
>>
>>
>> Optimization passes generally prefer converting ptrtoint and
>> inttoptr to GEPs whenever possible.
> This is good to hear and helps us.
>
>> I expect that we'll end up with *fewer* ptr<->int conversions
>> with this change, because we'll know enough about the target to
>> convert them into GEPs.
> Er, I'm confused by this. Why would not knowing the size of a
> pointer case a GEP to be converted to a ptr <-> int conversion?
>
>
> Having target data means we can convert inttoptr/ptrtoint into GEPs,
> particularly in constant expression folding.
>
> Or do you mean that after the change conversions in the original
> input IR are more likely to be recognized?
>
>>
>> My understanding is that the only current way to do this
>> would be to not specify a DataLayout. (And hack a few places
>> with built in assumptions. Let's ignore that for the
>> moment.) With your proposed change, would there be a clean
>> way to express something like this?
>>
>>
>> I think your GC placement algorithm needs to handle inttoptr and
>> ptrtoint, whichever way this discussion goes. Sorry. I'd be happy
>> to hear others chime in -- I know I'm not an expert in this area
>> or about GCs -- but I don't find this rationale compelling.
> The key assumption I didn't initially explain is that the initial
> IR couldn't contain conversions. With that added, do you still
> see concerns? I'm fairly sure I don't need to handle general ptr
> <-> int conversions. If I'm wrong, I'd really like to know it.
>
>
> So we met at the social and talked about this at length. I'll repeat
> most of the conversation so that it's on the mailing list, and also
> I've had some additional thoughts since then.
>
> You're using the llvm type system to detect when something is a
> pointer, and then you rely on knowing what's a pointer to deduce
> garbage collection roots. We're supposed to have the llvm.gcroots
> intrinsic for this purpose, but you note that it prevents gc roots
> from being in registers (they must be in memory somewhere, usually on
> the stack), and that fixing it is more work than is reasonable.
>
> Your IR won't do any shifty pointer-int conversion shenanigans, and
> you want some assurance that an optimization won't introduce them, or
> that if one does then you can call it out as a bug and get it fixed. I
> think that's reasonable, but I also think it's something we need to
> put forth before llvm-dev.
>
> Note that pointer-to-int conversions aren't necessarily just the
> ptrtoint/inttoptr instructions (and constant expressions), there's
> also casting between { i64 }* and { i8* }* and such. Are there
> legitimate reasons an optz'n would introduce a cast? I think that
> anywhere in the mid-optimizer, conflating integers and pointers is
> only going to be bad for both the integer optimizations and the
> pointer optimizations.
>
> It may make sense as part of lowering -- suppose we find two alloca's,
> one i64 and one i8* and find that their lifetimes are distinct, and
> i64 and i8* are the same size, so we merge them. Because of how this
> would interfere, I don't think this belongs anywhere in the
> mid-optimizer, it would have to happen late, after lowering. That
> suggests that there's a point in the pass pipeline where the IR is
> "canonical enough" that this will actually work.
>
> Is that reasonable? Can we actually guarantee that, that any pass
> which would break this goes after a common gc-root insertion spot? Do
> we need (want?) to push back and say "no, sorry, make GC roots better
> instead"?
>
> Nick
>
>>
>> p.s. From reading the mailing list a while back, I suspect
>> that the SPIR folks might have similar needs. (i.e. hiding
>> pointer sizes, etc..) Pure speculation on my part though.
>>
>>
>> The SPIR spec specifies two target datalayouts, one for 32 bits
>> and one for 64 bits.
> Good to know. Thanks.
>>
>> Nick
>>
> Philip
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140214/4cc2569c/attachment.html>
More information about the llvm-dev
mailing list