<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Nick,<br>
<br>
Thanks for writing up the summary of our conversation. I have a
couple of small clarifications to make, but I'm going to move that
into a separate thread since the discussion has largely devolved
from the original topic. <br>
<br>
To repeat my comment from last week, I support your proposed change
w.r.t. DataLayout. <br>
<br>
Philip<br>
<br>
<div class="moz-cite-prefix">On 02/10/2014 05:25 PM, Nick Lewycky
wrote:<br>
</div>
<blockquote
cite="mid:CADbEz-gPTzM0saPA5X9_SM0mqTyARaeTE77=a75fm9cSu5J4yw@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">On 5 February 2014 09:45, Philip
Reames <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:listmail@philipreames.com" target="_blank">listmail@philipreames.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div>
<div>On 1/31/14 5:23 PM, Nick Lewycky wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">On 30 January 2014 09:55, Philip
Reames <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:listmail@philipreames.com"
target="_blank">listmail@philipreames.com</a>></span>
wrote:<br>
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div>On 1/29/14 3:40 PM, Nick Lewycky wrote:<br>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">The
LLVM Module has an optional target
triple and target datalayout. Without
them, an llvm::DataLayout can't be
constructed with meaningful data. The
benefit to making them optional is to
permit optimization that would work
across all possible DataLayouts, then
allow us to commit to a particular one
at a later point in time, thereby
performing more optimization in advance.<br>
<br>
This feature is not being used. Instead,
every user of LLVM IR in a portability
system defines one or more standardized
datalayouts for their platform, and
shims to place calls with the outside
world. The primary reason for this is
that independence from DataLayout is not
sufficient to achieve portability
because it doesn't also represent ABI
lowering constraints. If you have a
system that attempts to use LLVM IR in a
portable fashion and does it without
standardizing on a datalayout, please
share your experience.<br>
</blockquote>
</div>
Nick, I don't have a current system in
place, but I do want to put forward an
alternate perspective.<br>
<br>
We've been looking at doing late insertion
of safepoints for garbage collection. One
of the properties that we end up needing to
preserve through all the optimizations which
precede our custom rewriting phase is that
the optimizer has not chosen to "hide"
pointers from us by using ptrtoint and
integer math tricks. Currently, we're simply
running a verification pass before our
rewrite, but I'm very interested long term
in constructing ways to ensure a "gc safe"
set of optimization passes.<br>
</blockquote>
<div><br>
</div>
<div>
<div>As a general rule passes need to
support the whole of what the IR can
support. Trying to operate on a subset of
IR seems like a losing battle, unless you
can show a mapping from one to the other
(ie., using code duplication to remove all
unnatural loops from IR, or collapsing a
function to having a single exit node).</div>
</div>
<div><br>
</div>
<div>What language were you planning to do
this for? Does the language permit the user
to convert pointers to integers and vice
versa? If so, what do you do if the user
program writes a pointer out to a file,
reads it back in later, and uses it?</div>
</div>
</div>
</div>
</blockquote>
</div>
Java - which does not permit arbitrary pointer
manipulation. (Well, without resorting to mechanism
like JNI and sun.misc.Unsafe. Doing so would be
explicitly undefined behavior though.) We also use raw
pointer manipulations in our implementation (which is
eventually inlined), but this happens after the
safepoint insertion rewrite.<br>
<br>
We strictly control the input IR. As a result, I can
insure that the initial IR meets our subset
requirements. In practice, all of the opto passes
appear to preserve these invariants (i.e. not
introducing inttoptr), but we'd like to justify that a
bit more. <br>
<div>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">One
of the ways I've been thinking about - but
haven't actually implemented yet - is to
deny the optimization passes information
about pointer sizing.</blockquote>
<div><br>
</div>
<div>Right, pointer size (address space size)
will become known to all parts of the
compiler. It's not even going to be just the
optimizations, ConstantExpr::get is going to
grow smarter because of this, as
lib/Analysis/ConstantFolding.cpp merges into
lib/IR/ConstantFold.cpp. That is one of the
major benefits that's driving this. (All
parts of the compiler will also know
endian-ness, which means we can constant
fold loads, too.)</div>
</div>
</div>
</div>
</blockquote>
</div>
I would argue that all of the pieces you mentioned are
performing optimizations. :) However, the exact
semantics are unimportant for the overall discussion. <br>
<div>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Under
the assumption that an opto pass can't
insert an ptrtoint cast without knowing a
safe integer size to use, this seems like it
would outlaw a class of optimizations we'd
be broken by.<br>
</blockquote>
<div><br>
</div>
<div>Optimization passes generally prefer
converting ptrtoint and inttoptr to GEPs
whenever possible. </div>
</div>
</div>
</div>
</blockquote>
</div>
This is good to hear and helps us.
<div><br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div>I expect that we'll end up with *fewer*
ptr<->int conversions with this
change, because we'll know enough about the
target to convert them into GEPs.</div>
</div>
</div>
</div>
</blockquote>
</div>
Er, I'm confused by this. Why would not knowing the
size of a pointer case a GEP to be converted to a ptr
<-> int conversion? <br>
</div>
</blockquote>
<div><br>
</div>
<div>Having target data means we can convert
inttoptr/ptrtoint into GEPs, particularly in constant
expression folding.</div>
<div><br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">Or do you mean that
after the change conversions in the original input IR
are more likely to be recognized?
<div><br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">My
understanding is that the only current way
to do this would be to not specify a
DataLayout. (And hack a few places with
built in assumptions. Let's ignore that for
the moment.) With your proposed change,
would there be a clean way to express
something like this?<br>
</blockquote>
<div><br>
</div>
<div>I think your GC placement algorithm needs
to handle inttoptr and ptrtoint, whichever
way this discussion goes. Sorry. I'd be
happy to hear others chime in -- I know I'm
not an expert in this area or about GCs --
but I don't find this rationale compelling.</div>
</div>
</div>
</div>
</blockquote>
</div>
The key assumption I didn't initially explain is that
the initial IR couldn't contain conversions. With that
added, do you still see concerns? I'm fairly sure I
don't need to handle general ptr <-> int
conversions. If I'm wrong, I'd really like to know it. </div>
</blockquote>
<div><br>
</div>
<div>So we met at the social and talked about this at
length. I'll repeat most of the conversation so that it's
on the mailing list, and also I've had some additional
thoughts since then.</div>
<div><br>
</div>
<div>You're using the llvm type system to detect when
something is a pointer, and then you rely on knowing
what's a pointer to deduce garbage collection roots. We're
supposed to have the llvm.gcroots intrinsic for this
purpose, but you note that it prevents gc roots from being
in registers (they must be in memory somewhere, usually on
the stack), and that fixing it is more work than is
reasonable.<br>
</div>
<div><br>
</div>
<div>Your IR won't do any shifty pointer-int conversion
shenanigans, and you want some assurance that an
optimization won't introduce them, or that if one does
then you can call it out as a bug and get it fixed. I
think that's reasonable, but I also think it's something
we need to put forth before llvm-dev.</div>
<div><br>
</div>
<div>Note that pointer-to-int conversions aren't necessarily
just the ptrtoint/inttoptr instructions (and constant
expressions), there's also casting between { i64 }* and {
i8* }* and such. Are there legitimate reasons an optz'n
would introduce a cast? I think that anywhere in the
mid-optimizer, conflating integers and pointers is only
going to be bad for both the integer optimizations and the
pointer optimizations.</div>
<div><br>
</div>
<div>It may make sense as part of lowering -- suppose we
find two alloca's, one i64 and one i8* and find that their
lifetimes are distinct, and i64 and i8* are the same size,
so we merge them. Because of how this would interfere, I
don't think this belongs anywhere in the mid-optimizer, it
would have to happen late, after lowering. That suggests
that there's a point in the pass pipeline where the IR is
"canonical enough" that this will actually work.</div>
<div><br>
</div>
<div>Is that reasonable? Can we actually guarantee that,
that any pass which would break this goes after a common
gc-root insertion spot? Do we need (want?) to push back
and say "no, sorry, make GC roots better instead"?</div>
<div><br>
</div>
<div>Nick<br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">p.s.
From reading the mailing list a while back,
I suspect that the SPIR folks might have
similar needs. (i.e. hiding pointer sizes,
etc..) Pure speculation on my part though.<br>
</blockquote>
<div><br>
</div>
<div>The SPIR spec specifies two target
datalayouts, one for 32 bits and one for 64
bits.</div>
</div>
</div>
</div>
</blockquote>
</div>
Good to know. Thanks.<br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div>Nick</div>
<div><br>
<span><font color="#888888"> </font></span></div>
<span><font color="#888888"> </font></span></div>
<span><font color="#888888"> </font></span></div>
<span><font color="#888888"> </font></span></div>
<span><font color="#888888"> </font></span></blockquote>
<span><font color="#888888"> Philip<br>
</font></span></div>
</blockquote>
</div>
<br>
</div>
</div>
</blockquote>
<br>
</body>
</html>