<html>

  <head>

    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    Nick,<br>

    <br>

    Thanks for writing up the summary of our conversation.  I have a

    couple of small clarifications to make, but I'm going to move that

    into a separate thread since the discussion has largely devolved

    from the original topic.  <br>

    <br>

    To repeat my comment from last week, I support your proposed change

    w.r.t. DataLayout.  <br>

    <br>

    Philip<br>

    <br>

    <div class="moz-cite-prefix">On 02/10/2014 05:25 PM, Nick Lewycky

      wrote:<br>

    </div>

    <blockquote

cite="mid:CADbEz-gPTzM0saPA5X9_SM0mqTyARaeTE77=a75fm9cSu5J4yw@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">On 5 February 2014 09:45, Philip

            Reames <span dir="ltr"><<a moz-do-not-send="true"

                href="mailto:listmail@philipreames.com" target="_blank">listmail@philipreames.com</a>></span>

            wrote:<br>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div bgcolor="#FFFFFF" text="#000000">

                <div>

                  <div>On 1/31/14 5:23 PM, Nick Lewycky wrote:<br>

                  </div>

                  <blockquote type="cite">

                    <div dir="ltr">On 30 January 2014 09:55, Philip

                      Reames <span dir="ltr"><<a

                          moz-do-not-send="true"

                          href="mailto:listmail@philipreames.com"

                          target="_blank">listmail@philipreames.com</a>></span>

                      wrote:<br>

                      <div class="gmail_extra">

                        <div class="gmail_quote">

                          <blockquote class="gmail_quote"

                            style="margin:0px 0px 0px

0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

                            <div>On 1/29/14 3:40 PM, Nick Lewycky wrote:<br>

                              <blockquote class="gmail_quote"

                                style="margin:0px 0px 0px

0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">The

                                LLVM Module has an optional target

                                triple and target datalayout. Without

                                them, an llvm::DataLayout can't be

                                constructed with meaningful data. The

                                benefit to making them optional is to

                                permit optimization that would work

                                across all possible DataLayouts, then

                                allow us to commit to a particular one

                                at a later point in time, thereby

                                performing more optimization in advance.<br>

                                <br>

                                This feature is not being used. Instead,

                                every user of LLVM IR in a portability

                                system defines one or more standardized

                                datalayouts for their platform, and

                                shims to place calls with the outside

                                world. The primary reason for this is

                                that independence from DataLayout is not

                                sufficient to achieve portability

                                because it doesn't also represent ABI

                                lowering constraints. If you have a

                                system that attempts to use LLVM IR in a

                                portable fashion and does it without

                                standardizing on a datalayout, please

                                share your experience.<br>

                              </blockquote>

                            </div>

                            Nick, I don't have a current system in

                            place, but I do want to put forward an

                            alternate perspective.<br>

                            <br>

                            We've been looking at doing late insertion

                            of safepoints for garbage collection.  One

                            of the properties that we end up needing to

                            preserve through all the optimizations which

                            precede our custom rewriting phase is that

                            the optimizer has not chosen to "hide"

                            pointers from us by using ptrtoint and

                            integer math tricks. Currently, we're simply

                            running a verification pass before our

                            rewrite, but I'm very interested long term

                            in constructing ways to ensure a "gc safe"

                            set of optimization passes.<br>

                          </blockquote>

                          <div><br>

                          </div>

                          <div>

                            <div>As a general rule passes need to

                              support the whole of what the IR can

                              support. Trying to operate on a subset of

                              IR seems like a losing battle, unless you

                              can show a mapping from one to the other

                              (ie., using code duplication to remove all

                              unnatural loops from IR, or collapsing a

                              function to having a single exit node).</div>

                          </div>

                          <div><br>

                          </div>

                          <div>What language were you planning to do

                            this for? Does the language permit the user

                            to convert pointers to integers and vice

                            versa? If so, what do you do if the user

                            program writes a pointer out to a file,

                            reads it back in later, and uses it?</div>

                        </div>

                      </div>

                    </div>

                  </blockquote>

                </div>

                Java - which does not permit arbitrary pointer

                manipulation.  (Well, without resorting to mechanism

                like JNI and sun.misc.Unsafe.  Doing so would be

                explicitly undefined behavior though.)  We also use raw

                pointer manipulations in our implementation (which is

                eventually inlined), but this happens after the

                safepoint insertion rewrite.<br>

                <br>

                We strictly control the input IR.  As a result, I can

                insure that the initial IR meets our subset

                requirements.  In practice, all of the opto passes

                appear to preserve these invariants (i.e. not

                introducing inttoptr), but we'd like to justify that a

                bit more.  <br>

                <div>

                  <blockquote type="cite">

                    <div dir="ltr">

                      <div class="gmail_extra">

                        <div class="gmail_quote">

                          <div><br>

                          </div>

                          <blockquote class="gmail_quote"

                            style="margin:0px 0px 0px

0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">One

                            of the ways I've been thinking about - but

                            haven't actually implemented yet - is to

                            deny the optimization passes information

                            about pointer sizing.</blockquote>

                          <div><br>

                          </div>

                          <div>Right, pointer size (address space size)

                            will become known to all parts of the

                            compiler. It's not even going to be just the

                            optimizations, ConstantExpr::get is going to

                            grow smarter because of this, as

                            lib/Analysis/ConstantFolding.cpp merges into

                            lib/IR/ConstantFold.cpp. That is one of the

                            major benefits that's driving this. (All

                            parts of the compiler will also know

                            endian-ness, which means we can constant

                            fold loads, too.)</div>

                        </div>

                      </div>

                    </div>

                  </blockquote>

                </div>

                I would argue that all of the pieces you mentioned are

                performing optimizations.  :)  However, the exact

                semantics are unimportant for the overall discussion.  <br>

                <div>

                  <blockquote type="cite">

                    <div dir="ltr">

                      <div class="gmail_extra">

                        <div class="gmail_quote">

                          <div><br>

                          </div>

                          <blockquote class="gmail_quote"

                            style="margin:0px 0px 0px

0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Under

                            the assumption that an opto pass can't

                            insert an ptrtoint cast without knowing a

                            safe integer size to use, this seems like it

                            would outlaw a class of optimizations we'd

                            be broken by.<br>

                          </blockquote>

                          <div><br>

                          </div>

                          <div>Optimization passes generally prefer

                            converting ptrtoint and inttoptr to GEPs

                            whenever possible. </div>

                        </div>

                      </div>

                    </div>

                  </blockquote>

                </div>

                This is good to hear and helps us.

                <div><br>

                  <blockquote type="cite">

                    <div dir="ltr">

                      <div class="gmail_extra">

                        <div class="gmail_quote">

                          <div>I expect that we'll end up with *fewer*

                            ptr<->int conversions with this

                            change, because we'll know enough about the

                            target to convert them into GEPs.</div>

                        </div>

                      </div>

                    </div>

                  </blockquote>

                </div>

                Er, I'm confused by this.  Why would not knowing the

                size of a pointer case a GEP to be converted to a ptr

                <-> int conversion?  <br>

              </div>

            </blockquote>

            <div><br>

            </div>

            <div>Having target data means we can convert

              inttoptr/ptrtoint into GEPs, particularly in constant

              expression folding.</div>

            <div><br>

            </div>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div bgcolor="#FFFFFF" text="#000000">Or do you mean that

                after the change conversions in the original input IR

                are more likely to be recognized?

                <div><br>

                  <blockquote type="cite">

                    <div dir="ltr">

                      <div class="gmail_extra">

                        <div class="gmail_quote">

                          <div><br>

                          </div>

                          <blockquote class="gmail_quote"

                            style="margin:0px 0px 0px

0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">My

                            understanding is that the only current way

                            to do this would be to not specify a

                            DataLayout.  (And hack a few places with

                            built in assumptions.  Let's ignore that for

                            the moment.)  With your proposed change,

                            would there be a clean way to express

                            something like this?<br>

                          </blockquote>

                          <div><br>

                          </div>

                          <div>I think your GC placement algorithm needs

                            to handle inttoptr and ptrtoint, whichever

                            way this discussion goes. Sorry. I'd be

                            happy to hear others chime in -- I know I'm

                            not an expert in this area or about GCs --

                            but I don't find this rationale compelling.</div>

                        </div>

                      </div>

                    </div>

                  </blockquote>

                </div>

                The key assumption I didn't initially explain is that

                the initial IR couldn't contain conversions.  With that

                added, do you still see concerns?  I'm fairly sure I

                don't need to handle general ptr <-> int

                conversions.  If I'm wrong, I'd really like to know it. </div>

            </blockquote>

            <div><br>

            </div>

            <div>So we met at the social and talked about this at

              length. I'll repeat most of the conversation so that it's

              on the mailing list, and also I've had some additional

              thoughts since then.</div>

            <div><br>

            </div>

            <div>You're using the llvm type system to detect when

              something is a pointer, and then you rely on knowing

              what's a pointer to deduce garbage collection roots. We're

              supposed to have the llvm.gcroots intrinsic for this

              purpose, but you note that it prevents gc roots from being

              in registers (they must be in memory somewhere, usually on

              the stack), and that fixing it is more work than is

              reasonable.<br>

            </div>

            <div><br>

            </div>

            <div>Your IR won't do any shifty pointer-int conversion

              shenanigans, and you want some assurance that an

              optimization won't introduce them, or that if one does

              then you can call it out as a bug and get it fixed. I

              think that's reasonable, but I also think it's something

              we need to put forth before llvm-dev.</div>

            <div><br>

            </div>

            <div>Note that pointer-to-int conversions aren't necessarily

              just the ptrtoint/inttoptr instructions (and constant

              expressions), there's also casting between { i64 }* and {

              i8* }* and such. Are there legitimate reasons an optz'n

              would introduce a cast? I think that anywhere in the

              mid-optimizer, conflating integers and pointers is only

              going to be bad for both the integer optimizations and the

              pointer optimizations.</div>

            <div><br>

            </div>

            <div>It may make sense as part of lowering -- suppose we

              find two alloca's, one i64 and one i8* and find that their

              lifetimes are distinct, and i64 and i8* are the same size,

              so we merge them. Because of how this would interfere, I

              don't think this belongs anywhere in the mid-optimizer, it

              would have to happen late, after lowering. That suggests

              that there's a point in the pass pipeline where the IR is

              "canonical enough" that this will actually work.</div>

            <div><br>

            </div>

            <div>Is that reasonable? Can we actually guarantee that,

              that any pass which would break this goes after a common

              gc-root insertion spot? Do we need (want?) to push back

              and say "no, sorry, make GC roots better instead"?</div>

            <div><br>

            </div>

            <div>Nick<br>

            </div>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <div bgcolor="#FFFFFF" text="#000000">

                <div>

                  <blockquote type="cite">

                    <div dir="ltr">

                      <div class="gmail_extra">

                        <div class="gmail_quote">

                          <div><br>

                          </div>

                          <blockquote class="gmail_quote"

                            style="margin:0px 0px 0px

0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">p.s.

                            From reading the mailing list a while back,

                            I suspect that the SPIR folks might have

                            similar needs.  (i.e. hiding pointer sizes,

                            etc..)  Pure speculation on my part though.<br>

                          </blockquote>

                          <div><br>

                          </div>

                          <div>The SPIR spec specifies two target

                            datalayouts, one for 32 bits and one for 64

                            bits.</div>

                        </div>

                      </div>

                    </div>

                  </blockquote>

                </div>

                Good to know.  Thanks.<br>

                <blockquote type="cite">

                  <div dir="ltr">

                    <div class="gmail_extra">

                      <div class="gmail_quote">

                        <div><br>

                        </div>

                        <div>Nick</div>

                        <div><br>

                          <span><font color="#888888"> </font></span></div>

                        <span><font color="#888888"> </font></span></div>

                      <span><font color="#888888"> </font></span></div>

                    <span><font color="#888888"> </font></span></div>

                  <span><font color="#888888"> </font></span></blockquote>

                <span><font color="#888888"> Philip<br>

                  </font></span></div>

            </blockquote>

          </div>

          <br>

        </div>

      </div>

    </blockquote>

    <br>

  </body>

</html>