<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    This explanation makes a lot more sense to me.  I think it would
    make sense to document this mental model, but I agree that this
    interpretation does not seem to require changes to the IR
    semantics.  <br>
    <br>
    Just to check, this implies that DSE *is* legal right?  <br>
    <br>
    Philip<br>
    <br>
    <div class="moz-cite-prefix">On 01/14/2016 05:48 AM, James Molloy
      wrote:<br>
    </div>
    <blockquote
cite="mid:CALCTSA0qScDU30UkXtNaH49=HP3k0ZXGHj5oW_j7A8UyT0qfdA@mail.gmail.com"
      type="cite">
      <div dir="ltr">Hi,
        <div><br>
        </div>
        <div>I've given a bit of misinformation here and have caused
          some confusion. After talking with Tim and Mehdi last night on
          IRC, I need to correct what I said above to fall more in line
          with what Daniel is saying. <span style="line-height:1.5">If
            any of the below contradicts what I've said already, please
            accept my apologies. This version should be right.</span></div>
        <div><br>
        </div>
        <div>The behaviour of the code generator for big-endian NEON and
          MIPS is derived from the fact that we did not want to change
          IR semantics at all. A fundamental property that we do not
          want to break is memory round-tripping:</div>
        <div><br>
        </div>
        <div>%1 = load <4 x i32>, %p32</div>
        <div>%2 = bitcast <4 x i32> %1 to <2 x i64></div>
        <div>store <2 x i64> %2, (bitcast %p32 to <2 x
          i64>*)</div>
        <div><br>
        </div>
        <div>The value of memory before and after the store MUST NOT
          change (contrary to what I said in an earlier post, I know).</div>
        <div><br>
        </div>
        <div>So in fact everything you can do in IR is valid. There are
          no changes to IR semantics in the slightest. However, when it
          comes to generating code from the IR, there are new rules:</div>
        <div>  1) Loads and stores are selected to be special loads and
          stores that do some transform from a canonical form in memory
          to a type-specific form in register.</div>
        <div>  2) Because bitcasts are load/store pairs in semantic,
          they must behave as if a store then load was done.
          Specifically (bitcast TyA to TyB) must transform TyA ->
          canonical form -> TyB, as a store then load would.
          Therefore bitcasts are not no-ops during code generation (*but
          behave as if they are from an IR perspective!*).</div>
        <div><br>
        </div>
        <div>The reason this works neatly in IR is due to the IR's type
          system. In order to change type, a cast must be inserted or a
          memory round trip. There is no other way. However in SDAG,
          things break down a bit. SDAG is more weakly typed, and
          bitconverts are often simply removed. We need that not to
          happen. Bitconverts are not no-ops.</div>
        <div><br>
        </div>
        <div>Daniel's explanation of physical register mapping was
          excellent so I'm not going to repeat that.</div>
        <div><br>
        </div>
        <div>I apologise for the confusion and misinformation. This is
          quite a complex topic and takes a bit of mind bending for me
          to understand, and it was a long time ago.</div>
        <div><br>
        </div>
        <div>James</div>
      </div>
      <br>
      <div class="gmail_quote">
        <div dir="ltr">On Thu, 14 Jan 2016 at 13:17 Daniel Sanders <<a
            moz-do-not-send="true"
            href="mailto:Daniel.Sanders@imgtec.com"><a class="moz-txt-link-abbreviated" href="mailto:Daniel.Sanders@imgtec.com">Daniel.Sanders@imgtec.com</a></a>>
          wrote:<br>
        </div>
        <blockquote class="gmail_quote" style="margin:0 0 0
          .8ex;border-left:1px #ccc solid;padding-left:1ex">
          <div bgcolor="white" link="blue" vlink="purple" lang="EN-GB">
            <div>
              <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:windowtext">>
                </span>Ok.  Then we need to change the LangRef as
                suggested.  Given this is a rather important semantic
                change, I think you need to send a top level RFC to the
                list. </p>
              <p class="MsoNormal"> </p>
            </div>
          </div>
          <div bgcolor="white" link="blue" vlink="purple" lang="EN-GB">
            <div>
              <p class="MsoNormal">FWIW, I don't think this is a
                semantic change to LLVM-IR itself. I think it's more
                clearing up the misconception that LLVM-IR semantics
                also apply to SelectionDAG's operations. That said, I do
                think it's important to mention this in LangRef since
                it's very easy to make this mistake and very few targets
                need to worry about the distinction.</p>
              <p class="MsoNormal"> </p>
              <p class="MsoNormal">To explain why I don't think this is
                a semantic change to LLVM-IR, let's consider this
                example from earlier:</p>
            </div>
          </div>
          <div bgcolor="white" link="blue" vlink="purple" lang="EN-GB">
            <div>
              <p class="MsoNormal">    %0 = load <4 x i32> %x<br>
                    %1 = bitcast <4 x i32> %0 to <2 x i64><br>
              </p>
            </div>
          </div>
          <div bgcolor="white" link="blue" vlink="purple" lang="EN-GB">
            <div>
              <p class="MsoNormal">
                    store <2 x i64> %1, <2 x i64>* %y</p>
            </div>
          </div>
          <div bgcolor="white" link="blue" vlink="purple" lang="EN-GB">
            <div>
              <p class="MsoNormal"> </p>
              <p class="MsoNormal">In LLVM-IR terms, if the value of %0
                is:</p>
              <p class="MsoNormal">    %0 =
                0x00112233_44556677_8899aabb_ccddeeff</p>
              <p class="MsoNormal">then the value of %1 is:</p>
              <p class="MsoNormal">    %1 =
                0x0011223344556677_8899aabbccddeeff</p>
              <p class="MsoNormal">which agrees with the store/load and
                the 'no bits change' statements in LangRef.</p>
              <p class="MsoNormal"> </p>
              <p class="MsoNormal">However, the mapping of these bits to
                physical register bits is not consistent between types:</p>
              <p class="MsoNormal">    Physreg(%0) =
                0xccddeeff_8899aabb_44556677_00112233</p>
              <p class="MsoNormal">    Physreg(%1) =
                0x8899aabbccddeeff_0011223344556677</p>
              <p class="MsoNormal"> </p>
              <p class="MsoNormal">Essentially, I'm saying that
                BitCastInst and ISD::BITCAST have slightly different
                semantics because of their different domains. The former
                is working on an abstract representation of the values
                where both statements in LangRef are true, but the
                latter is closer to the target where the 'no bits
                change' statement ceases to be true in some cases.</p>
            </div>
          </div>
          <div bgcolor="white" link="blue" vlink="purple" lang="EN-GB">
            <div>
              <p class="MsoNormal">> A couple of points that will
                need clarified:<br>
                > - Does this only apply to vector types?  It
                definitely doesn't apply between pointer types today. 
                What about integer, floating point, and FCAs?</p>
              <p class="MsoNormal"> </p>
            </div>
          </div>
          <div bgcolor="white" link="blue" vlink="purple" lang="EN-GB">
            <div>
              <p class="MsoNormal">I've only seen it for vector types so
                far but in theory it could happen for other types. I'd
                expect FCAs to encounter it since the physical registers
                may contain padding that isn't present in the LLVM-IR
                representation and the placement and amount of padding
                will depend on the exact FCA. </p>
              <p class="MsoNormal">I can think of cases where address
                space casts can encounter the same problem but that's
                already been covered in LangRef ("It can be a no-op cast
                or a complex value modification, depending on the target
                and the address space pair.").</p>
              <p class="MsoNormal"> </p>
              <p class="MsoNormal">Does anyone use FCAs directly? Most
                targets seem to convert them to same-sized integers or
                bitcast an FCA* to i8*.</p>
            </div>
          </div>
          <div bgcolor="white" link="blue" vlink="purple" lang="EN-GB">
            <div>
              <p class="MsoNormal"><br>
                > - Is combining two casts into one a legal
                operation?  I think it is so far, but we need to
                explicitly state that.</p>
              <p class="MsoNormal"> </p>
            </div>
          </div>
          <div bgcolor="white" link="blue" vlink="purple" lang="EN-GB">
            <div>
              <p class="MsoNormal">Yes, A->B->C and A->C are
                equivalent.</p>
            </div>
          </div>
          <div bgcolor="white" link="blue" vlink="purple" lang="EN-GB">
            <div>
              <p class="MsoNormal"><br>
                > - Do we have a predicate for identifying no-op
                casts that can be freely removed/combined?</p>
              <p class="MsoNormal"> </p>
            </div>
          </div>
          <div bgcolor="white" link="blue" vlink="purple" lang="EN-GB">
            <div>
              <p class="MsoNormal">James mentioned one in CGP but I
                haven't been able to find it. I don't think it's
                necessary to have one at the LLVM-IR level but we do
                need one in the backends. I remember adding one to the
                backend but I can't find that either so I think I'm
                remembering one of my patches from before I split MSA's
                registers into type-specific classes.</p>
            </div>
          </div>
          <div bgcolor="white" link="blue" vlink="purple" lang="EN-GB">
            <div>
              <p class="MsoNormal"><br>
                > - Is coercing a load to the type it's immediately
                bitcast to legal under this model? </p>
              <p class="MsoNormal"> </p>
            </div>
          </div>
          <div bgcolor="white" link="blue" vlink="purple" lang="EN-GB">
            <div>
              <p class="MsoNormal">Yes.<span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:windowtext"></span></p>
              <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:windowtext"> </span></p>
              <div style="border:none;border-left:solid blue
                1.5pt;padding:0cm 0cm 0cm 4.0pt">
                <div>
                  <div style="border:none;border-top:solid #b5c4df
                    1.0pt;padding:3.0pt 0cm 0cm 0cm">
                    <p class="MsoNormal"><b><span
style="font-size:10.0pt;font-family:"Tahoma","sans-serif";color:windowtext"
                          lang="EN-US">From:</span></b><span
style="font-size:10.0pt;font-family:"Tahoma","sans-serif";color:windowtext"
                        lang="EN-US"> llvm-dev [mailto:<a
                          moz-do-not-send="true"
                          href="mailto:llvm-dev-bounces@lists.llvm.org"
                          target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:llvm-dev-bounces@lists.llvm.org">llvm-dev-bounces@lists.llvm.org</a></a>]
                        <b>On Behalf Of </b>Philip Reames via llvm-dev<br>
                        <b>Sent:</b> 13 January 2016 20:31<br>
                        <b>To:</b> James Molloy; Hal Finkel<br>
                        <b>Cc:</b> llvm-dev</span></p>
                  </div>
                </div>
              </div>
            </div>
          </div>
          <div bgcolor="white" link="blue" vlink="purple" lang="EN-GB">
            <div>
              <div style="border:none;border-left:solid blue
                1.5pt;padding:0cm 0cm 0cm 4.0pt">
                <div>
                  <div style="border:none;border-top:solid #b5c4df
                    1.0pt;padding:3.0pt 0cm 0cm 0cm">
                    <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Tahoma","sans-serif";color:windowtext"
                        lang="EN-US"><br>
                        <b>Subject:</b> Re: [llvm-dev] [GlobalISel] A
                        Proposal for global instruction selection</span></p>
                  </div>
                </div>
              </div>
            </div>
          </div>
          <div bgcolor="white" link="blue" vlink="purple" lang="EN-GB">
            <div>
              <div style="border:none;border-left:solid blue
                1.5pt;padding:0cm 0cm 0cm 4.0pt">
                <p class="MsoNormal"> </p>
                <p class="MsoNormal" style="margin-bottom:12.0pt"> </p>
                <div>
                  <p class="MsoNormal">On 01/13/2016 12:20 PM, James
                    Molloy wrote:</p>
                </div>
                <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                  <div>
                    <p class="MsoNormal">>  (Right?) </p>
                    <div>
                      <p class="MsoNormal"> </p>
                    </div>
                    <div>
                      <p class="MsoNormal">Uh no, the register content
                        explicitly does change :( We insert REV
                        instructions (byteswap) on each bitcast.
                        Bitcasts can be merged and elided etc, but
                        conceptually there's a register content change
                        on every bitcast.</p>
                    </div>
                  </div>
                </blockquote>
                <p class="MsoNormal">Ok.  Then we need to change the
                  LangRef as suggested.  Given this is a rather
                  important semantic change, I think you need to send a
                  top level RFC to the list. 
                  <br>
                  <br>
                  A couple of points that will need clarified:<br>
                  - Does this only apply to vector types?  It definitely
                  doesn't apply between pointer types today.  What about
                  integer, floating point, and FCAs?<br>
                  - Is combining two casts into one a legal operation? 
                  I think it is so far, but we need to explicitly state
                  that.
                  <br>
                  - Do we have a predicate for identifying no-op casts
                  that can be freely removed/combined?<br>
                  - Is coercing a load to the type it's immediately
                  bitcast to legal under this model? 
                  <br>
                  <br>
                </p>
                <div>
                  <div>
                    <p class="MsoNormal"> </p>
                  </div>
                  <div>
                    <p class="MsoNormal">James</p>
                  </div>
                </div>
                <p class="MsoNormal"> </p>
                <div>
                  <div>
                    <p class="MsoNormal">On Wed, 13 Jan 2016 at 18:09
                      Philip Reames <<a moz-do-not-send="true"
                        href="mailto:listmail@philipreames.com"
                        target="_blank">listmail@philipreames.com</a>>
                      wrote:</p>
                  </div>
                  <blockquote style="border:none;border-left:solid
                    #cccccc 1.0pt;padding:0cm 0cm 0cm
                    6.0pt;margin-left:4.8pt;margin-right:0cm">
                    <p class="MsoNormal" style="margin-bottom:12.0pt"><br>
                      <br>
                      On 01/13/2016 08:01 AM, Hal Finkel via llvm-dev
                      wrote:<br>
                      > ----- Original Message -----<br>
                      >> From: "James Molloy" <<a
                        moz-do-not-send="true"
                        href="mailto:james@jamesmolloy.co.uk"
                        target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:james@jamesmolloy.co.uk">james@jamesmolloy.co.uk</a></a>><br>
                      >> To: "Hal Finkel" <<a
                        moz-do-not-send="true"
                        href="mailto:hfinkel@anl.gov" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:hfinkel@anl.gov">hfinkel@anl.gov</a></a>><br>
                      >> Cc: "llvm-dev" <<a
                        moz-do-not-send="true"
                        href="mailto:llvm-dev@lists.llvm.org"
                        target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a></a>>,
                      "Quentin Colombet" <<a moz-do-not-send="true"
                        href="mailto:qcolombet@apple.com"
                        target="_blank">qcolombet@apple.com</a>><br>
                      >> Sent: Wednesday, January 13, 2016 9:54:26
                      AM<br>
                      >> Subject: Re: [llvm-dev] [GlobalISel] A
                      Proposal for global instruction selection<br>
                      >><br>
                      >><br>
                      >>> I think that teaching the optimizer
                      about big-Endian lane ordering<br>
                      >>> would have been better.<br>
                      >><br>
                      >> It's certainly arguable. Even in
                      hindsight I'm glad we didn't -<br>
                      >> that's the approach GCC took and they've
                      been fixing subtle bugs in<br>
                      >> their vectorizer ever since.<br>
                      >><br>
                      >><br>
                      >>> Inserting the REV after every LDR<br>
                      >><br>
                      >> We only do this conceptually. In most
                      cases REVs cancel out, and we<br>
                      >> have the LD1 instruction which is
                      LDR+REV. With enough peepholes<br>
                      >> there's really no need for code to run
                      slower.<br>
                      >><br>
                      >><br>
                      >>> Given what's been done, should we
                      update the LangRef.<br>
                      >><br>
                      >> Potentially, yes. I hadn't realised quite
                      how strongly worded it was<br>
                      >> with respect to this.<br>
                      >><br>
                      > Please do ;)<br>
                      I'm not sure changing bitcast is the right place. 
                      Since the bitcast is<br>
                      representing the in-register value (which doesn't
                      change), maybe we<br>
                      should define it as part of the load/store
                      instead?  That's essentially<br>
                      what's going on; we're converting from a canonical
                      register form to a<br>
                      variety of memory forms.  (Right?)<br>
                      ><br>
                      >   -Hal<br>
                      ><br>
                      >> James<br>
                      >><br>
                      >><br>
                      >> On Wed, 13 Jan 2016 at 14:39 Hal Finkel
                      < <a moz-do-not-send="true"
                        href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>
                      > wrote:<br>
                      >><br>
                      >><br>
                      >><br>
                      >><br>
                      >> [resending so the message is smaller]<br>
                      >><br>
                      >><br>
                      >><br>
                      >><br>
                      >><br>
                      >><br>
                      >> From: "James Molloy via llvm-dev" < <a
                        moz-do-not-send="true"
                        href="mailto:llvm-dev@lists.llvm.org"
                        target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a></a> ><br>
                      >> To: "Quentin Colombet" < <a
                        moz-do-not-send="true"
                        href="mailto:qcolombet@apple.com"
                        target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:qcolombet@apple.com">qcolombet@apple.com</a></a> ><br>
                      >> Cc: "llvm-dev" < <a
                        moz-do-not-send="true"
                        href="mailto:llvm-dev@lists.llvm.org"
                        target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a></a> ><br>
                      >> Sent: Wednesday, January 13, 2016 2:35:32
                      AM<br>
                      >> Subject: Re: [llvm-dev] [GlobalISel] A
                      Proposal for global<br>
                      >> instruction selection<br>
                      >><br>
                      >> Hi Philip,<br>
                      >><br>
                      >><br>
                      >><br>
                      >><br>
                      >><br>
                      >> store <2 x i64> %1, <2 x
                      i64>* %y<br>
                      >><br>
                      >> Yes. The memory pattern differs. This is
                      the first diagram on the<br>
                      >> right at: <a moz-do-not-send="true"
                        href="http://llvm.org/docs/BigEndianNEON.html#bitconverts"
                        target="_blank">
http://llvm.org/docs/BigEndianNEON.html#bitconverts</a> )<br>
                      >><br>
                      >><br>
                      >> I think that teaching the optimizer about
                      big-Endian lane ordering<br>
                      >> would have been better. Inserting the REV
                      after every LDR sounds<br>
                      >> very similar to what we do for VSX on
                      little-Endian PowerPC systems<br>
                      >> (PowerPC may have a slight advantage here
                      in that we don't need to<br>
                      >> do insertelement / extractelement /
                      shufflevector through memory on<br>
                      >> systems where little-Endian mode is
                      relevant, see<br>
                      >> <a moz-do-not-send="true"
href="http://llvm.org/devmtg/2014-10/Slides/Schmidt-SupportingVectorProgramming.pdf"
                        target="_blank">
http://llvm.org/devmtg/2014-10/Slides/Schmidt-SupportingVectorProgramming.pdf</a><br>
                      >> ).<br>
                      >><br>
                      >> Given what's been done, should we update
                      the LangRef. It currently<br>
                      >> reads, " The ‘ bitcast ‘ instruction
                      converts value to type ty2 . It<br>
                      >> is always a no-op cast because no bits
                      change with this conversion.<br>
                      >> The conversion is done as if the value
                      had been stored to memory and<br>
                      >> read back as type ty2 ." But this is now,
                      at the least, misleading,<br>
                      >> because this process of storing the value
                      as one type and reading it<br>
                      >> back in as another does, in fact, change
                      the bits. We need to make<br>
                      >> clear that this might change the bits
                      (perhaps specifically by<br>
                      >> calling out this case of vector bitcasts
                      on big-Endian systems?).<br>
                      >><br>
                      >><br>
                      >><br>
                      >> Also, regarding this, " Most operating
                      systems however do not run<br>
                      >> with alignment faults enabled, so this is
                      often not an issue." Are<br>
                      >> you saying that the processor does the
                      correct thing in this case<br>
                      >> (if alignment faults are not enabled,
                      then it performs a proper<br>
                      >> unaligned load), or that the
                      operating-system trap handler emulates<br>
                      >> the unaligned load should one occur?<br>
                      >><br>
                      >> Thanks again,<br>
                      >> Hal<br>
                      >><br>
                      >><br>
                      >>
                      _______________________________________________<br>
                      >><br>
                      >><br>
                      >> LLVM Developers mailing list<br>
                      >> <a moz-do-not-send="true"
                        href="mailto:llvm-dev@lists.llvm.org"
                        target="_blank">llvm-dev@lists.llvm.org</a><br>
                      >> <a moz-do-not-send="true"
                        href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"
                        target="_blank">
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
                      >><br>
                      >><br>
                      >> --<br>
                      >> Hal Finkel<br>
                      >> Assistant Computational Scientist<br>
                      >> Leadership Computing Facility<br>
                      >> Argonne National Laboratory<br>
                      >></p>
                  </blockquote>
                </div>
                <p class="MsoNormal"> </p>
              </div>
            </div>
          </div>
        </blockquote>
      </div>
    </blockquote>
    <br>
  </body>
</html>