<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    I'm generally in support of (1), but you should definitely get

    active buy-in from Chandler before moving forward.  He's the most

    knowledgeable on the tradeoffs.<br>

    <br>

    Also, are you willing to commit to the fairly large amount of work

    (1) implies?  I strongly suspect that doing (1) right will be *far*

    more work in the short term than (2).  It may still be the right

    long term answer, but are you ready to see the entire thing

    through?  Getting half way through and stopping would be much worse

    than (2).  <br>

    <br>

    Philip<br>

    <br>

    <div class="moz-cite-prefix">On 03/22/2016 11:32 AM, Ehsan Amiri via

      llvm-dev wrote:<br>

    </div>

    <blockquote

cite="mid:CAJ6ibVc9izT2U1iUe9PNrBDWBmDHhj8YT6_iaOQbC_ZOKAG4Cg@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div>

          <div>

            <div>Back to the discussion on the RFC, I still see some

              advantage in following the proposed solution. I see two

              paths forward:<br>

              <br>

            </div>

            1- Change canonical form, possibly lower memcpy to

            non-integer load and store in InstCombine. Then teach the

            backends to convert that to integer load and store if that

            is more profitable. Notice that we are talking about loads

            that have no use other than store. So it is a fairly simple

            change in the backends.<br>

            <br>

          </div>

          2- Do not change the canonical form. Then for this bug, we

          need to teach select speculation to see through bitcasts. We

          will probably need to teach other optimizations to see though

          bitcasts in the future as problems are uncovered. That is

          until typeless pointer work is complete. Once the typeless

          pointer work is complete, we have some extra code in each

          optimization for seeing through bitcasts which is possibly no

          longer needed.<br>

          <br>

        </div>

        <div>Based on this I think (1) is the right thing to do. But

          there might be other reasons for the current canonical form

          that I am not aware of. Please let me know what you think.<br>

          <br>

        </div>

        <div>Thanks<br>

        </div>

        Ehsan<br>

      </div>

      <div class="gmail_extra"><br>

        <div class="gmail_quote">On Wed, Mar 16, 2016 at 2:13 PM, David

          Blaikie <span dir="ltr"><<a moz-do-not-send="true"

              href="mailto:dblaikie@gmail.com" target="_blank">dblaikie@gmail.com</a>></span>

          wrote:<br>

          <blockquote class="gmail_quote" style="margin:0 0 0

            .8ex;border-left:1px #ccc solid;padding-left:1ex">

            <div dir="ltr"><br>

              <div class="gmail_extra"><br>

                <div class="gmail_quote"><span class="">On Wed, Mar 16,

                    2016 at 11:00 AM, Ehsan Amiri <span dir="ltr"><<a

                        moz-do-not-send="true"

                        href="mailto:ehsanamiri@gmail.com"

                        target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:ehsanamiri@gmail.com">ehsanamiri@gmail.com</a></a>></span>

                    wrote:<br>

                    <blockquote class="gmail_quote" style="margin:0 0 0

                      .8ex;border-left:1px #ccc solid;padding-left:1ex">

                      <div dir="ltr">

                        <div>

                          <div>

                            <div>David,<br>

                              <br>

                            </div>

                            Could you give us an update on the status of

                            typeless pointer work? How much work is left

                            and when you think it might be ready?<br>

                          </div>

                        </div>

                      </div>

                    </blockquote>

                    <div><br>

                    </div>

                  </span>

                  <div>It's a bit of an onion peel, really - since it

                    will eventually involve generalizing/fixing every

                    optimization that's currently leaning on typed

                    pointers to keep the performance while removing the

                    crutch they're currently leaning on. (in cases where

                    bitcasts are literally just getting in the way,

                    those won't require cleaning up & should just

                    become "free performance wins" once we remove them,

                    though)<br>

                    <br>

                    At the moment we can roundtrip every LLVM IR test

                    case through bitcode and textual IR (reading/writing

                    both formats) while using only a narrow whitelist of

                    places that request the type of a pointer (things

                    like the verifier, the parser/printer where it

                    actually needs the typed pointer to verify it

                    matches the explicit type, etc).<br>

                    <br>

                    The next thing on the list is probably figuring out

                    the byval/inalloca representation (add an explicit

                    pointee type? just make the number of bytes explicit

                    with no type information?).<br>

                    <br>

                    Then start migrating optimizations over - doing the

                    same sort of testing I did for the IR/bitcode

                    roundtripping - assert that the pointee type is not

                    accessed, whitelist places that need it until the

                    bitcasts go away, fix anything else... it'll still

                    be a fair bit of work & I don't really know how

                    much. It should parallelize pretty well (doing any

                    of this work is really helpful, each optimization is

                    indepednent, etc) if anyone wants to/is able to

                    help.<br>

                    <br>

                    - Dave</div>

                  <div>

                    <div class="h5">

                      <div> </div>

                      <blockquote class="gmail_quote" style="margin:0 0

                        0 .8ex;border-left:1px #ccc

                        solid;padding-left:1ex">

                        <div dir="ltr">

                          <div>

                            <div><br>

                            </div>

                            Thanks<span><font color="#888888"><br>

                              </font></span></div>

                          <span><font color="#888888">Ehsan<br>

                              <br>

                            </font></span></div>

                        <div>

                          <div>

                            <div class="gmail_extra"><br>

                              <div class="gmail_quote">On Wed, Mar 16,

                                2016 at 1:12 PM, David Blaikie <span

                                  dir="ltr"><<a

                                    moz-do-not-send="true"

                                    href="mailto:dblaikie@gmail.com"

                                    target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:dblaikie@gmail.com">dblaikie@gmail.com</a></a>></span>

                                wrote:<br>

                                <blockquote class="gmail_quote"

                                  style="margin:0 0 0

                                  .8ex;border-left:1px #ccc

                                  solid;padding-left:1ex">

                                  <div dir="ltr"><br>

                                    <div class="gmail_extra"><br>

                                      <div class="gmail_quote"><span>On

                                          Wed, Mar 16, 2016 at 8:34 AM,

                                          Mehdi Amini via llvm-dev <span

                                            dir="ltr"><<a

                                              moz-do-not-send="true"

                                              href="mailto:llvm-dev@lists.llvm.org"

                                              target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a></a>></span>

                                          wrote:<br>

                                          <blockquote

                                            class="gmail_quote"

                                            style="margin:0 0 0

                                            .8ex;border-left:1px #ccc

                                            solid;padding-left:1ex">

                                            <div

                                              style="word-wrap:break-word">Hi,

                                              <div><br>

                                              </div>

                                              <div>How do it interact

                                                with the "typeless

                                                pointers" work?</div>

                                            </div>

                                          </blockquote>

                                          <div><br>

                                          </div>

                                        </span>

                                        <div>Right - the goal of the

                                          typeless pointer work is to

                                          fix all these bugs related to

                                          "didn't look through bitcasts"

                                          in optimizations. Sometimes

                                          that's going to mean more work

                                          (because the code is leaning

                                          on the absence of bitcasts

                                          & the presence of

                                          convenient (but not

                                          guaranteed) type information

                                          to inform optimization

                                          decisions) but if we remove

                                          typed pointers while keeping

                                          optimization quality in the

                                          cases we have today, then we

                                          should've also fixed the cases

                                          that were broken because the

                                          type information didn't end up

                                          aligning to produce the

                                          optimal output.<br>

                                          <br>

                                          & I know I've been off the

                                          typeless pointer stuff for a

                                          few months working on llvm-dwp

                                          - but happy for any help (the

                                          next immediate piece is

                                          probably figuring out teh

                                          right representation for byval

                                          and inalloca - there were some

                                          code reviews sent out for that

                                          that I'll need to come back

                                          around to - but also any

                                          optimizations people want to

                                          help rework/improve would be

                                          great too & I can provide

                                          some techniques/tools to help

                                          people approach those)<br>

                                          <br>

                                          - Dave</div>

                                        <div>

                                          <div>

                                            <div> </div>

                                            <blockquote

                                              class="gmail_quote"

                                              style="margin:0 0 0

                                              .8ex;border-left:1px #ccc

                                              solid;padding-left:1ex">

                                              <div

                                                style="word-wrap:break-word">

                                                <div><br>

                                                </div>

                                                <div>Thanks,</div>

                                                <div><br>

                                                </div>

                                                <div>-- </div>

                                                <div>Mehdi</div>

                                                <div><br>

                                                  <div>

                                                    <blockquote

                                                      type="cite">

                                                      <div>

                                                        <div>

                                                          <div>On Mar

                                                          16, 2016, at

                                                          6:41 AM, Ehsan

                                                          Amiri via

                                                          llvm-dev <<a

moz-do-not-send="true" href="mailto:llvm-dev@lists.llvm.org"

                                                          target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a></a>>

                                                          wrote:</div>

                                                          <br>

                                                        </div>

                                                      </div>

                                                      <div>

                                                        <div>

                                                          <div>

                                                          <div dir="ltr">===

                                                          PROBLEM ===

                                                          (See this bug

                                                          <a

                                                          moz-do-not-send="true"

href="https://llvm.org/bugs/show_bug.cgi?id=26445" target="_blank"><a class="moz-txt-link-freetext" href="https://llvm.org/bugs/show_bug.cgi?id=26445">https://llvm.org/bugs/show_bug.cgi?id=26445</a></a>)<br>

                                                          <br>

                                                          IR contains

                                                          code for

                                                          loading a

                                                          float from

                                                          float * and

                                                          storing it to

                                                          a float *

                                                          address. After

                                                          canonicalization

                                                          of load in

                                                          InstCombine

                                                          [1], new

                                                          bitcasts are

                                                          added to the

                                                          IR (see bottom

                                                          of the email

                                                          for code

                                                          samples). This

                                                          prevents

                                                          select

                                                          speculation in

                                                          SROA to work.

                                                          Also after

                                                          SROA we have

                                                          bitcasts from

                                                          int32 to

                                                          float.

                                                          (Whereas

                                                          originally

                                                          after

                                                          instCombine,

                                                          bitcasts are

                                                          only done on

                                                          pointer

                                                          types).<br>

                                                          <br>

                                                          === PROPOSED

                                                          SOLUTION===<br>

                                                          <br>

                                                          [1] implies

                                                          that we need

                                                          load

                                                          canonicalization

                                                          when we load a

                                                          value only to

                                                          store it

                                                          again. The

                                                          reason is to

                                                          avoid

                                                          generating

                                                          slightly

                                                          different code

                                                          (due to

                                                          different ways

                                                          of adding

                                                          bitcasts), in

                                                          different

                                                          situations. In

                                                          all examples

                                                          presented in

                                                          [1] there is a

                                                          non-zero

                                                          number of

                                                          bitcasts. I

                                                          think when we

                                                          load a value

                                                          of type T from

                                                          a T* address

                                                          and store it

                                                          as a type T

                                                          value to one

                                                          or more T*

                                                          address (and

                                                          there is no

                                                          other use or

                                                          store), we can

                                                          redefine

                                                          canonical form

                                                          to mean there

                                                          should not be

                                                          any bitcasts.

                                                          So we still

                                                          have a

                                                          canonical

                                                          form, but its

                                                          definition is

                                                          slightly

                                                          different.<br>

                                                          <br>

                                                          === REASONS

                                                          FOR /

                                                          AGAINST===<br>

                                                          <br>

                                                          Hal Finkel

                                                          warns that

                                                          while this may

                                                          be useful for

                                                          power pc, this

                                                          may hurt more

                                                          than one other

                                                          platform and

                                                          become a very

                                                          large project.

                                                          Despite this

                                                          he is fine

                                                          with bringing

                                                          up the issue

                                                          to the mailing

                                                          list to get

                                                          feedback,

                                                          mostly because

                                                          this seems

                                                          inline with

                                                          our future

                                                          direction of

                                                          having a

                                                          unique type

                                                          for all

                                                          pointers. 

                                                          (Hal please

                                                          correct me if

                                                          I

                                                          misunderstood

                                                          your comment)<br>

                                                          <br>

                                                          This is a much

                                                          simpler fix

                                                          compared to

                                                          alternatives.

                                                          (ignoring

                                                          potential

                                                          regressions)<br>

                                                          <br>

                                                          ===

                                                          ALTERNATIVE

                                                          SOLUTION ===<br>

                                                          <br>

                                                          Fix select

                                                          speculation in

                                                          SROA to see

                                                          through

                                                          bitcasts.

                                                          Handle

                                                          remaining

                                                          bitcasts

                                                          during code

                                                          gen. Other

                                                          alternative

                                                          solutions are

                                                          welcome.<br>

                                                          <br>

                                                          Should I

                                                          implement the

                                                          proposed

                                                          solution or is

                                                          it too risky?

                                                          I understand

                                                          that we may

                                                          need to undo

                                                          it if it

                                                          breaks too

                                                          many things.

                                                          Comments are

                                                          welcome.<br>

                                                          <br>

                                                          <br>

                                                          [1] <a

                                                          moz-do-not-send="true"

href="http://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html"

target="_blank"><a class="moz-txt-link-freetext" href="http://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html">http://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html</a></a> 

                                                          r226781  git

                                                          commit id:

                                                          b778cbc0c8<br>

                                                          <br>

                                                          <br>

                                                          <br>

                                                          Code Samples

                                                          (only relevant

                                                          part is

                                                          copied):<br>

                                                          <br>

                                                          -------------------- 

                                                          Before

                                                          Canonicalization

                                                          (contains call

                                                          to std::max):

                                                          -------------------- 

                                                          <br>

                                                          entry:<br>

                                                            %max_value =

                                                          alloca float,

                                                          align 4<br>

                                                            %1 = load

                                                          float, float*

                                                          %input, align

                                                          4, !tbaa !1<br>

                                                            store float

                                                          %1, float*

                                                          %max_value,

                                                          align 4, !tbaa

                                                          !1<br>

                                                          <br>

                                                          for.body:<br>

                                                            %call = call

                                                          dereferenceable(4)

                                                          float*

                                                          @_ZSt3maxIfERKT_S2_S2_(float*

                                                          dereferenceable(4)

                                                          %max_value,

                                                          float*

                                                          dereferenceable(4)

                                                          %arrayidx1)<br>

                                                            %3 = load

                                                          float, float*

                                                          %call, align

                                                          4, !tbaa !1<br>

                                                            store float

                                                          %3, float*

                                                          %max_value,

                                                          align 4, !tbaa

                                                          !1<br>

                                                          <br>

                                                          -------------------- 

                                                          After

                                                          Canonicalization

                                                          (contains call

                                                          to

                                                          std::max):-------------------- 

                                                          <br>

                                                          <br>

                                                          entry:<br>

                                                            %max_value =

                                                          alloca float,

                                                          align 4<br>

                                                            %1 = bitcast

                                                          float* %input

                                                          to i32*<br>

                                                            %2 = load

                                                          i32, i32* %1,

                                                          align 4, !tbaa

                                                          !1<br>

                                                            %3 = bitcast

                                                          float*

                                                          %max_value to

                                                          i32*<br>

                                                            store i32

                                                          %2, i32* %3,

                                                          align 4, !tbaa

                                                          !1<br>

                                                          <br>

                                                          for.body:<br>

                                                            %call = call

                                                          dereferenceable(4)

                                                          float*

                                                          @_ZSt3maxIfERKT_S2_S2_(float*

                                                          nonnull

                                                          dereferenceable(4)

                                                          %max_value,

                                                          float*

                                                          dereferenceable(4)

                                                          %arrayidx1)<br>

                                                            %5 = bitcast

                                                          float* %call

                                                          to i32*<br>

                                                            %6 = load

                                                          i32, i32* %5,

                                                          align 4, !tbaa

                                                          !1<br>

                                                            %7 = bitcast

                                                          float*

                                                          %max_value to

                                                          i32*<br>

                                                            store i32

                                                          %6, i32* %7,

                                                          align 4, !tbaa

                                                          !1<br>

                                                          <br>

                                                          --------------------

                                                          After SROA

                                                          (the call to

                                                          std::max is

                                                          inlined

                                                          now):-------------------- 

                                                          <br>

                                                          entry:<br>

                                                          %max_value.sroa.0

                                                          = alloca i32<br>

                                                            %0 = bitcast

                                                          float* %input

                                                          to i32*<br>

                                                            %1 = load

                                                          i32, i32* %0,

                                                          align 4, !tbaa

                                                          !1<br>

                                                            store i32

                                                          %1, i32*

                                                          %max_value.sroa.0<br>

                                                          <br>

                                                          for.body: <br>

                                                          %max_value.sroa.0.0.max_value.sroa.0.0.6

                                                          = load i32,

                                                          i32*

                                                          %max_value.sroa.0<br>

                                                            %3 = bitcast

                                                          i32

                                                          %max_value.sroa.0.0.max_value.sroa.0.0.6

                                                          to float<br>

                                                          %max_value.sroa.0.0.max_value.sroa_cast8

                                                          = bitcast i32*

                                                          %max_value.sroa.0

                                                          to float*<br>

                                                            %__b.__a.i =

                                                          select i1

                                                          %cmp.i, float*

                                                          %arrayidx1,

                                                          float*

                                                          %max_value.sroa.0.0.max_value.sroa_cast8<br>

                                                            %5 = bitcast

                                                          float*

                                                          %__b.__a.i to

                                                          i32*<br>

                                                            %6 = load

                                                          i32, i32* %5,

                                                          align 4, !tbaa

                                                          !1<br>

                                                            store i32

                                                          %6, i32*

                                                          %max_value.sroa.0<br>

                                                          <br>

                                                          --------------------

                                                          After SROA

                                                          when

                                                          Canonicalization

                                                          is turned

                                                          off-------------------- 

                                                          <br>

                                                          entry:<br>

                                                            %0 = load

                                                          float, float*

                                                          %input, align

                                                          4, !tbaa !1<br>

                                                          <br>

                                                          for.cond:                                        

                                                          ; preds =

                                                          %for.body,

                                                          %entry<br>

                                                            %max_value.0

                                                          = phi float [

                                                          %0, %entry ],

                                                          [

                                                          %.sroa.speculated,

                                                          %for.body ]<br>

                                                          <br>

                                                          for.body:<br>

                                                            %1 = load

                                                          float, float*

                                                          %arrayidx1,

                                                          align 4, !tbaa

                                                          !1<br>

                                                            %cmp.i =

                                                          fcmp olt float

                                                          %max_value.0,

                                                          %1<br>

                                                          %.sroa.speculate.load.true

                                                          = load float,

                                                          float*

                                                          %arrayidx1,

                                                          align 4, !tbaa

                                                          !1<br>

                                                          %.sroa.speculated

                                                          = select i1

                                                          %cmp.i, float

                                                          %.sroa.speculate.load.true,

                                                          float

                                                          %max_value.0<br>

                                                          </div>

                                                          </div>

                                                        </div>

_______________________________________________<br>

                                                        LLVM Developers

                                                        mailing list<br>

                                                        <a

                                                          moz-do-not-send="true"

href="mailto:llvm-dev@lists.llvm.org" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a></a><br>

                                                        <a

                                                          moz-do-not-send="true"

href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"

                                                          target="_blank"><a class="moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a></a><br>

                                                      </div>

                                                    </blockquote>

                                                  </div>

                                                  <br>

                                                </div>

                                              </div>

                                              <br>

_______________________________________________<br>

                                              LLVM Developers mailing

                                              list<br>

                                              <a moz-do-not-send="true"

href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

                                              <a moz-do-not-send="true"

href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"

                                                rel="noreferrer"

                                                target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

                                              <br>

                                            </blockquote>

                                          </div>

                                        </div>

                                      </div>

                                      <br>

                                    </div>

                                  </div>

                                </blockquote>

                              </div>

                              <br>

                            </div>

                          </div>

                        </div>

                      </blockquote>

                    </div>

                  </div>

                </div>

                <br>

              </div>

            </div>

          </blockquote>

        </div>

        <br>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

LLVM Developers mailing list

<a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>

<a class="moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>

</pre>

    </blockquote>

    <br>

  </body>

</html>