<div dir="ltr"><div class="uyb8Gf"><div class="F3hlO"><div text="#000000" bgcolor="#FFFFFF">Hi Philip,</div><div text="#000000" bgcolor="#FFFFFF"><br></div><div text="#000000" bgcolor="#FFFFFF">> I think after reading your link I'm actually more confused.  This might just be a wording problem, but let me ask a couple of clarifying questions.<br><br>Sorry about that :( Every time I explain this I get slightly more embarassed because it is indeed weird and ugly (but was certainly the least ugly solution).<br><br>> 1) After compiling the code sequence below (from that page), does the in memory bit pattern differ?  The page seemed to contradict itself.  <br><pre>> %0 = load <4 x i32> %x
> %1 = bitcast <4 x i32> %0 to <2 x i64>
>      store <2 x i64> %1, <2 x i64>* %y

</pre>Yes. The memory pattern differs. This is the first diagram on the right at: <a href="http://llvm.org/docs/BigEndianNEON.html#bitconverts">http://llvm.org/docs/BigEndianNEON.html#bitconverts</a>)</div><div text="#000000" bgcolor="#FFFFFF"><br></div><div text="#000000" bgcolor="#FFFFFF">> If so, does this mean that performing dead-store-elimination is illegal for ARM?<br><br>Yes, for vector types whose corresponding load differs from the store type. </div><div text="#000000" bgcolor="#FFFFFF"><br></div><div text="#000000" bgcolor="#FFFFFF">%0 = load <4 x i32> %x</div><div text="#000000" bgcolor="#FFFFFF">store <4 x i32> %0, <4 x i32>* %x</div><div text="#000000" bgcolor="#FFFFFF"><br></div><div text="#000000" bgcolor="#FFFFFF">is still fine. I should go and check that DSE doesn't do bad things for big-endian NEON actually...</div><div text="#000000" bgcolor="#FFFFFF"><br>> 3) Are loads and stores ever allowed to fault based on the in memory representation?  <br><br>No (thank goodness!)</div><div text="#000000" bgcolor="#FFFFFF"><br>> 4) What happens if we have a load of <2xi64> following the store above and we do DSE the store before forwarding it's value?</div></div></div><div class="uyb8Gf"><br></div>The store can't be DSE'd as above. But value forwarding is fine. It's fine because the IR is strongly typed - there's no way to remove that bitcast and still have the IR correctly formed. However folding bitcasts into memory operands is explicitly illegal:<br><br><br>%1 = bitcast <4 x i32> %x to <2 x i64><br>store <2 x i64> %x to <2 x i64>* %y<br>  =><br>store <4 x i32> %x to (bitcast <2 x i64>* %x to < 4 x i32>*) ; ILLEGAL!<div><br></div><div>There's a hook somewhere in CGP that disables an optimization that tries to do this.</div><div><br></div><div>So in IR, because it's strongly typed, there's not really many special cases or things to worry about. But in SDAG things get more difficult. SDAG is weakly typed and all bitconverts will just get blasted into oblivion, so while SDAG can merge bitconverts (bitconvert (bitconvert %x)) -> (bitconvert %x), it mustn't remove them completely.</div><div><br></div><div>I hope I've explained that OK. CCing Tim who can hopefully pick more holes in the explanation.</div><div><br></div><div>Also, could you please point me to where the documentation seems contradictory? then I'll fix it. I wrote it for exactly this scenario!</div><div><br></div><div>Cheers,</div><div><br></div><div>James</div><br><div class="gmail_quote"><div dir="ltr">On Wed, 13 Jan 2016 at 00:42 Quentin Colombet <<a href="mailto:qcolombet@apple.com">qcolombet@apple.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div>Hi James,</div><div><br></div>I am also confused!<div><br></div><div><div></div></div></div><div style="word-wrap:break-word"><div><div><blockquote type="cite"><div>On Jan 12, 2016, at 4:11 PM, Philip Reames <<a href="mailto:listmail@philipreames.com" target="_blank">listmail@philipreames.com</a>> wrote:</div><br><div>
  
    
  
  <div text="#000000" bgcolor="#FFFFFF">
    I think after reading your link I'm actually more confused.  This
    might just be a wording problem, but let me ask a couple of
    clarifying questions.<br>
    <br>
    1) After compiling the code sequence below (from that page), does
    the in memory bit pattern differ?  The page seemed to contradict
    itself.  <br></div></div></blockquote><div><br></div></div></div></div><div style="word-wrap:break-word"><div><div>+1</div><div><br></div><div>Thanks,</div><div>Q.</div></div></div><div style="word-wrap:break-word"><div><div><br><blockquote type="cite"><div><div text="#000000" bgcolor="#FFFFFF">
    <pre>%0 = load <4 x i32> %x
%1 = bitcast <4 x i32> %0 to <2 x i64>
     store <2 x i64> %1, <2 x i64>* %y

</pre>
    2) If so, does this mean that performing dead-store-elimination is
    illegal for ARM?<br>
    <br>
    3) Are loads and stores ever allowed to fault based on the in memory
    representation?  <br>
    <br>
    4) What happens if we have a load of <2xi64> following the
    store above and we do DSE the store before forwarding it's value?<br>
    <br>
    Philip<br>
    <br>
    <br>
    <div>On 01/12/2016 05:55 AM, James Molloy
      via llvm-dev wrote:<br>
    </div>
    <blockquote type="cite">
      <div dir="ltr">Hi,
        <div><br>
        </div>
        <div>
          <div>
            <div>
              <div link="blue" vlink="purple" lang="EN-GB"><p class="MsoNormal"><span style="font-size:11pt;font-family:Calibri,sans-serif">>
                    I found this thinking quite difficult to explain.
                    Does it make sense?</span></p>
                <div><span style="font-size:11pt;font-family:Calibri,sans-serif">It
                    might help to link to the documentation on why
                    bitcasts are weird on big-endian NEON: </span><font face="Calibri, sans-serif"><span style="font-size:14.6667px;line-height:22px"><a href="http://llvm.org/docs/BigEndianNEON.html#bitconverts" target="_blank"></a><a href="http://llvm.org/docs/BigEndianNEON.html#bitconverts" target="_blank">http://llvm.org/docs/BigEndianNEON.html#bitconverts</a></span></font></div>
                <div><font face="Calibri, sans-serif"><span style="font-size:14.6667px;line-height:22px"><br>
                    </span></font></div>
                <div><font face="Calibri, sans-serif"><span style="font-size:14.6667px;line-height:22px">Cheers,</span></font></div>
                <div><font face="Calibri, sans-serif"><span style="font-size:14.6667px;line-height:22px"><br>
                    </span></font></div>
                <div><font face="Calibri, sans-serif"><span style="font-size:14.6667px;line-height:22px">James</span></font></div>
              </div>
            </div>
          </div>
        </div>
      </div>
      <br>
      <div class="gmail_quote">
        <div dir="ltr">On Tue, 12 Jan 2016 at 13:23 Daniel Sanders via
          llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>>
          wrote:<br>
        </div>
        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
          <div link="blue" vlink="purple" lang="EN-GB">
            <div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">Hi,</span></p><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">I
                  haven't found much time to look into the LLVM-IR-level
                  optimizations yet so I'm not sure how they handle
                  bitcasts. With that disclaimer in mind, I expect it's
                  fine for the LLVM-IR level optimizations to handle
                  them using either definition since they are equivalent
                  at the LLVM-IR level. My thinking is that LLVM-IR is
                  consistent about how virtual bits are assigned to
                  types and that non-zero instruction nops arise when
                  there is inconsistency.</span></p><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">At
                  the LLVM-IR level, bits 0-127 of <4 x i32> map
                  directly onto bits 0-127 of <2 x i64> using the
                  identity map. It's therefore ok to interpret such
                  bitcasts as zero-instruction no-ops. As far as I can
                  tell, LLVM-IR has been defined such that the identity
                  map can be used for bitcasts between all same-sized
                  types, and also such that bitcasting between
                  different-sized types is invalid.</span></p><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">Similarly,
                  most targets have a single mapping of virtual bit
                  numbers to physical bit numbers for each size that is
                  applied consistently when mapping a type to memory.
                  For example 32-bits map like so:</span></p><p class="MsoNormal" style="text-indent:36.0pt"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">Little
                  Endian Targets: virtual register bits
                  {0..7,8..15,16..23,24..31} map to physical memory bits
                  {0..7,8..15,16..23,24..31}</span></p><p class="MsoNormal" style="text-indent:36.0pt"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">Big
                  Endian Targets: virtual register bits
                  {0..7,8..15,16..23,24..31} map to physical memory bits
                  {24..31,16..23,8..15,0..7}</span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">regardless
                  of whether it's a float, or an i32. We therefore need
                  zero instructions to re-map physical memory bits for
                  one type onto another type.</span></p><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">The
                  same idea holds for physical register classes. There's
                  a single consistent mapping from physical memory bits
                  to physical register bits that applies for all types
                  that can be stored in that class. As long as this is
                  the case the load/store and zero-instruction
                  interpretation of bitcasts are equivalent.</span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">In
                  the case of big-endian MSA and NEON, there isn't a
                  single consistent mapping from physical memory bits to
                  physical register bits so the equivalence in the two
                  definitions breaks down:</span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">               
                  i128: virtual register bits {0..31, 32..63, 64..95,
                  96...127} map to physical memory bits {96..127,
                  64..95, 32..63, 0..31}</span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">               
                  <4 x i32>: virtual register bits {0..31, 32..63,
                  64..95, 96...127} map to physical memory bits {0..31,
                  32..63, 64..95, 96..127}</span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">               
                  <2 x i64>: virtual register bits {0..31, 32..63,
                  64..95, 96...127} map to physical memory bits {32..63,
                  0..31, 96..127, 64..95}</span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">with
                  these inconsistent mappings we require instructions to
                  bitcast between the types.</span></p><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">I
                  found this thinking quite difficult to explain. Does
                  it make sense?</span></p>
            </div>
          </div>
          <div link="blue" vlink="purple" lang="EN-GB">
            <div><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">>
                </span>I am fine with treating bit casts as equivalent
                store/load pairs in GISel, I just want to be sure we do
                not have a semantic gap between the LLVM-IR and the
                backend if we do.</p><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div>
            </div>
          </div>
          <div link="blue" vlink="purple" lang="EN-GB">
            <div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">I
                  think a gap would arise from not having a GISel
                  equivalent to ISD::BITCAST (gBITCAST?) available when
                  it's necessary for correctness. However, I agree that
                  GISel should delete bitcasts for the common case where
                  the store/load and zero-instruction definitions are
                  equivalent.</span></p><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div>
              <div style="border:none;border-left:solid blue 1.5pt;padding:0cm 0cm 0cm 4.0pt">
                <div>
                  <div style="border:none;border-top:solid #b5c4df 1.0pt;padding:3.0pt 0cm 0cm 0cm"><p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"" lang="EN-US">From:</span></b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"" lang="EN-US"> Quentin Colombet [mailto:<a href="mailto:qcolombet@apple.com" target="_blank"></a><a href="mailto:qcolombet@apple.com" target="_blank">qcolombet@apple.com</a>]
                        <br>
                        <b>Sent:</b> 11 January 2016 17:23<br>
                        <b>To:</b> Daniel Sanders<br>
                        <b>Cc:</b> Tim Northover (<a href="mailto:t.p.northover@gmail.com" target="_blank"></a><a href="mailto:t.p.northover@gmail.com" target="_blank">t.p.northover@gmail.com</a>);
                        llvm-dev</span></p>
                  </div>
                </div>
              </div>
            </div>
          </div>
          <div link="blue" vlink="purple" lang="EN-GB">
            <div>
              <div style="border:none;border-left:solid blue 1.5pt;padding:0cm 0cm 0cm 4.0pt">
                <div>
                  <div style="border:none;border-top:solid #b5c4df 1.0pt;padding:3.0pt 0cm 0cm 0cm"><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"" lang="EN-US"><br>
                        <b>Subject:</b> Re: [llvm-dev] [GlobalISel] A
                        Proposal for global instruction selection</span></p>
                  </div>
                </div>
              </div>
            </div>
          </div>
          <div link="blue" vlink="purple" lang="EN-GB">
            <div>
              <div style="border:none;border-left:solid blue 1.5pt;padding:0cm 0cm 0cm 4.0pt"><div> <br></div><p class="MsoNormal">Hi Daniel,</p>
                <div><div> <br></div>
                </div>
                <div><p class="MsoNormal">Thanks for the pointers, I wasn’t
                    aware of the second thread you’ve mentioned.</p>
                </div>
                <div><div> <br></div>
                </div>
                <div><p class="MsoNormal">I may be wrong but I think
                    LLVM-IR optimizations really treat bistcasts as
                    no-op casts, in the sense of no instructions are
                    required.</p>
                </div>
                <div><div> <br></div>
                </div>
                <div><p class="MsoNormal">Is there anyone that could chime
                    in on that?</p>
                </div>
                <div><div> <br></div>
                </div>
                <div><p class="MsoNormal">However, it seems SelectionDAG
                    sticks to the load/store semantic:</p>
                </div>
                <div><p class="MsoNormal"><span>"BITCAST
                      - This operator converts between integer, vector
                      and FP values, as if the value was
                      <b>stored to memory with one type and loaded from
                        the same address with the other type</b> (or
                      equivalently for vector format conversions, etc)."</span></p>
                </div>
                <div><div> <br></div>
                </div>
                <div><p class="MsoNormal">I am fine with treating bit casts
                    as equivalent store/load pairs in GISel, I just want
                    to be sure we do not have a semantic gap between the
                    LLVM-IR and the backend if we do.</p>
                </div>
                <div><div> <br></div>
                </div>
                <div><p class="MsoNormal">Thanks,</p>
                </div>
                <div><p class="MsoNormal">-Quentin</p>
                </div>
                <div><div> <br></div>
                  <div>
                    <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                      <div><p class="MsoNormal">On Jan 11, 2016, at 7:43
                          AM, Daniel Sanders <<a href="mailto:Daniel.Sanders@imgtec.com" target="_blank"></a><a href="mailto:Daniel.Sanders@imgtec.com" target="_blank">Daniel.Sanders@imgtec.com</a>>
                          wrote:</p>
                      </div><div> <br></div>
                      <div>
                        <div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">Hi,</span></p>
                        </div>
                        <div><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div>
                        </div>
                        <div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">It
                              was a comment by Tim that first made me
                              aware of it (see<span> </span><a href="http://lists.llvm.org/pipermail/llvm-dev/2013-August/064714.html" target="_blank"><span style="color:purple"></span></a><a href="http://lists.llvm.org/pipermail/llvm-dev/2013-August/064714.html" target="_blank">http://lists.llvm.org/pipermail/llvm-dev/2013-August/064714.html</a></span><span> </span>but

                              I think he commented on one of my patches
                              before that).</p>
                        </div>
                        <div><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div>
                        </div>
                        <div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">I
                              asked about it on llvm-dev a couple weeks
                              later (<a href="http://lists.llvm.org/pipermail/llvm-dev/2013-August/064919.html" target="_blank"><span style="color:purple">http://lists.llvm.org/pipermail/llvm-dev/2013-August/064919.html</span></a>)
                              highlighting the contradiction and was
                              told that 'no-op cast' referred to the
                              lack of math rather than a requirement
                              that zero instructions are used. It's
                              therefore my understanding that shuffling
                              the bits to preserve the load/store based
                              definition isn't considered to be changing
                              the bits.</span></p>
                        </div>
                        <div><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div>
                        </div>
                        <div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">I
                              think the main thing the current
                              definition is unclear on is whether it
                              refers to the bits in a physical machine
                              register or the bits in the LLVM-IR
                              virtual register. Most of the time these
                              two views are the same but this doesn't
                              quite work for big-endian MSA/NEON. For
                              example:</span></p>
                        </div>
                        <div><p class="MsoNormal" style="text-indent:36.0pt"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">%0
                              = bitcast <4 x i32> <i32 1, i32
                              2, i32 3, i32 4> to <2 x i64></span></p>
                        </div>
                        <div><p class="MsoNormal" style="text-indent:36.0pt"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">%0
                              = <2 x i64> <i64 (1 << 32)
                              | 2, i64 (3 << 32) | 4></span></p>
                        </div>
                        <div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">are
                              equivalent to each other in LLVM-IR terms
                              but the constants are physically laid out
                              in MSA registers as:</span></p>
                        </div>
                        <div><p class="MsoNormal" style="text-indent:36.0pt"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">0x00000004000000030000000200000001
                              # <4 x i32> <i32 1, i32 2, i32 3,
                              i32 4></span></p>
                        </div>
                        <div><p class="MsoNormal" style="text-indent:36.0pt"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">0x00000003000000040000000100000002
                              # <2 x i64> <i64 (1 << 32)
                              | 2, i64 (3 << 32) | 4></span></p>
                        </div>
                        <div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">and
                              we must therefore shuffle the bits to
                              preserve LLVM-IR's point of view.</span></p>
                        </div>
                        <div><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div>
                        </div>
                        <div style="border:none;border-left:solid blue 1.5pt;padding:0cm 0cm 0cm 4.0pt">
                          <div>
                            <div style="border:none;border-top:solid #b5c4df 1.0pt;padding:3.0pt 0cm 0cm 0cm">
                              <div><p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"" lang="EN-US">From:</span></b><span><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"" lang="EN-US"> </span></span><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"" lang="EN-US">Quentin Colombet [<a href="mailto:qcolombet@apple.com" target="_blank"></a><a href="mailto:qcolombet@apple.com" target="_blank">mailto:qcolombet@apple.com</a>]<span> </span><br>
                                    <b>Sent:</b><span> </span>07 January
                                    2016 19:58<br>
                                    <b>To:</b><span> </span>Daniel
                                    Sanders<br>
                                    <b>Cc:</b><span> </span>llvm-dev<br>
                                    <b>Subject:</b><span> </span>Re:
                                    [llvm-dev] [GlobalISel] A Proposal
                                    for global instruction selection</span></p>
                              </div>
                            </div>
                          </div>
                          <div><div> <br></div>
                          </div>
                          <div><p class="MsoNormal">Hi Daniel,</p>
                          </div>
                          <div>
                            <div><div> <br></div>
                            </div>
                          </div>
                          <div>
                            <div><p class="MsoNormal">I had a quick look at
                                the language reference for bitcast and I
                                have a different reading than what you
                                were pointing out.</p>
                            </div>
                          </div>
                          <div>
                            <div><p class="MsoNormal">Indeed, my take away
                                is:</p>
                            </div>
                          </div>
                          <div>
                            <div><p class="MsoNormal"><span>"It
                                  is<span> </span><b>always a </b></span><em><b><span>no-op

                                      cast</span></b></em><span> because
                                  no bits change with this conversion."</span></p>
                            </div>
                          </div>
                          <div>
                            <div><div> <br></div>
                            </div>
                          </div>
                          <div>
                            <div><p class="MsoNormal">In other words,
                                deleting all bitcast instructions should
                                be fine.</p>
                            </div>
                          </div>
                          <div>
                            <div><div> <br></div>
                            </div>
                          </div>
                          <div>
                            <div><p class="MsoNormal">My understanding of
                                the quote you’ve highlighted is that it
                                tells C programmers that this is like a
                                memcpy, not a cast :).</p>
                            </div>
                          </div>
                          <div>
                            <div><div> <br></div>
                            </div>
                          </div>
                          <div>
                            <div><p class="MsoNormal">Cheers,</p>
                            </div>
                          </div>
                          <div>
                            <div><p class="MsoNormal">-Quentin</p>
                            </div>
                            <div>
                              <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
                                <div>
                                  <div><p class="MsoNormal">On Nov 20,
                                      2015, at 6:53 AM, Daniel Sanders
                                      <<a href="mailto:Daniel.Sanders@imgtec.com" target="_blank"><span style="color:purple">Daniel.Sanders@imgtec.com</span></a>>
                                      wrote:</p>
                                  </div>
                                </div>
                                <div><div> <br></div>
                                </div>
                                <div>
                                  <div>
                                    <div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">Hi,</span></p>
                                    </div>
                                  </div>
                                  <div>
                                    <div><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div>
                                    </div>
                                  </div>
                                  <div>
                                    <div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">I
                                          haven't had chance to read all
                                          of this yet, but one minor
                                          thing occurred to me during
                                          your presentation that I want
                                          to mention. At one point you
                                          mentioned deleting all the
                                          bitcast instructions since
                                          they're equivalent to nops but
                                          this isn't always true.</span></p>
                                    </div>
                                  </div>
                                  <div>
                                    <div><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div>
                                    </div>
                                  </div>
                                  <div>
                                    <div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">The<span> </span><a href="http://llvm.org/docs/LangRef.html" target="_blank"><span style="color:purple"></span></a><a href="http://llvm.org/docs/LangRef.html" target="_blank">http://llvm.org/docs/LangRef.html</a></span><span> </span>definition

                                          of the bitcast instruction
                                          includes this sentence:</p>
                                    </div>
                                  </div>
                                  <div>
                                    <div><p class="MsoNormal" style="text-indent:36.0pt"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">The
                                          conversion is done as if the
                                          value had been stored to
                                          memory and read back as type
                                          ty2.</span></p>
                                    </div>
                                  </div>
                                  <div>
                                    <div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">For
                                          big-endian MSA, this is
                                          equivalent to a shuffling of
                                          the bits in the register
                                          because endianness only
                                          changes the byte order within
                                          each element. The order of the
                                          elements is unaffected by
                                          endianness. IIRC, big-endian
                                          NEON is the same way.</span></p>
                                    </div>
                                  </div>
                                  <div>
                                    <div><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div>
                                    </div>
                                  </div>
                                  <div style="border:none;border-left:solid blue 1.5pt;padding:0cm 0cm 0cm 4.0pt">
                                    <div>
                                      <div style="border:none;border-top:solid #b5c4df 1.0pt;padding:3.0pt 0cm 0cm 0cm">
                                        <div>
                                          <div><p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"" lang="EN-US">From:</span></b><span><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"" lang="EN-US"> </span></span><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"" lang="EN-US">llvm-dev [<a href="mailto:llvm-dev-bounces@lists.llvm.org" target="_blank"><span style="color:purple"></span></a><a href="mailto:llvm-dev-bounces@lists.llvm.org" target="_blank">mailto:llvm-dev-bounces@lists.llvm.org</a></span>]<span> </span><b>On
                                                  Behalf Of<span> </span></b>Quentin
                                                Colombet via llvm-dev<br>
                                                <b>Sent:</b><span> </span>18
                                                November 2015 19:27<br>
                                                <b>To:</b><span> </span>llvm-dev<br>
                                                <b>Subject:</b><span> </span>[llvm-dev]
                                                [GlobalISel] A Proposal
                                                for global instruction
                                                selection</p>
                                          </div>
                                        </div>
                                      </div>
                                    </div>
                                    <div>
                                      <div><div> <br></div>
                                      </div>
                                    </div>
                                    <div>
                                      <div>
                                        <div>
                                          <div><p class="MsoNormal">Hi,<br>
                                              <span style="color:#12c00e"><br>
                                              </span>With this email, I
                                              would like to kick-off the
                                              development for the next
                                              instruction selector that
                                              I described during the
                                              last LLVM Dev’ Meeting.<br>
                                              For the motivations, see
                                              Jakob’s proposal (<a href="http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-August/064727.html" target="_blank"><span style="color:purple"></span></a><a href="http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-August/064727.html" target="_blank">http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-August/064727.html</a>)
                                              and for the proposal, see
                                              the slides (Keynote: <a href="http://llvm.org/viewvc/llvm-project/www/trunk/devmtg/2015-10/slides/Colombet-GlobalInstructionSelection.key?view=co" target="_blank"><span style="color:purple"></span></a><a href="http://llvm.org/viewvc/llvm-project/www/trunk/devmtg/2015-10/slides/Colombet-GlobalInstructionSelection.key?view=co" target="_blank">http://llvm.org/viewvc/llvm-project/www/trunk/devmtg/2015-10/slides/Colombet-GlobalInstructionSelection.key?view=co</a> or

                                              PDF: <a href="http://llvm.org/viewvc/llvm-project/www/trunk/devmtg/2015-10/slides/Colombet-GlobalInstructionSelection.pdf?revision=252430&view=co" target="_blank"><span style="color:purple"></span></a><a href="http://llvm.org/viewvc/llvm-project/www/trunk/devmtg/2015-10/slides/Colombet-GlobalInstructionSelection.pdf?revision=252430&view=co" target="_blank">http://llvm.org/viewvc/llvm-project/www/trunk/devmtg/2015-10/slides/Colombet-GlobalInstructionSelection.pdf?revision=252430&view=co</a>)
                                              or the talk (<a href="https://www.youtube.com/watch?v=F6GGbYtae3g&list=PL_R5A0lGi1AA4Lv2bBFSwhgDaHvvpVU21&index=2" target="_blank"><span style="color:purple"></span></a><a href="https://www.youtube.com/watch?v=F6GGbYtae3g&list=PL_R5A0lGi1AA4Lv2bBFSwhgDaHvvpVU21&index=2" target="_blank">https://www.youtube.com/watch?v=F6GGbYtae3g&list=PL_R5A0lGi1AA4Lv2bBFSwhgDaHvvpVU21&index=2</a>).</p>
                                          </div>
                                        </div>
                                      </div>
                                      <div>
                                        <div>
                                          <div><p class="MsoNormal"><br>
                                              TL;DR This is happening
                                              now, feedbacks invited!<br>
                                              <br>
                                              *** Context ***<br>
                                              <span style="color:#12c00e"><br>
                                              </span>During the last
                                              LLVM Dev’ Meeting, I have
                                              presented a proposal for
                                              the next instruction
                                              selector, GlobalISel. The
                                              proposal is basically
                                              summarized in "High Level
                                              Prototype Design” and
                                              “Roadmap”. (If you want
                                              further details, feel free
                                              to reach me.)<br>
                                              <span style="color:#00afcd"><br>
                                              </span>The first step of
                                              the development plan is to
                                              prototype the new
                                              framework on open source.
                                              The idea is to <b>start
                                                prototyping now(!)</b> and
                                              have the discussion
                                              ongoing in parallel. The
                                              reason of such approach is
                                              to have code that can be
                                              used to inform those
                                              discussions, e.g., by
                                              collecting data and trying
                                              different designs
                                              approaches. Regarding the
                                              discussion, I have listed
                                              a few points where your
                                              feedbacks would be
                                              particularly appreciated
                                              (see Feedback Invite).</p>
                                          </div>
                                        </div>
                                      </div>
                                      <div>
                                        <div>
                                          <div><p class="MsoNormal"><span style="color:#00afcd"><br>
                                              </span>Also, as I have
                                              mentioned in my talk, some
                                              issues are controversial
                                              but I expect them to be
                                              resolved during prototype
                                              development. Specifically
                                              theses concern aspects of
                                              legalization (should parts
                                              of it be done at the LLVM
                                              IR level or all at the MI
                                              level?) and code re-use
                                              for instruction combiner.
                                              Please feel free to bring
                                              up your specific concern
                                              as I move along with the
                                              development plan.<br>
                                              <span style="color:#00afcd"><br>
                                              </span>I expect the design
                                              to evolve with our
                                              experimental findings and
                                              your feedbacks
                                              and contributions.<br>
                                              Nonetheless, we expect to
                                              nail down some design
                                              decisions once and for all
                                              as the prototype
                                              progresses. I have
                                              highlighted them with
                                              the following pattern <b>[final]</b>.<br>
                                              <span style="color:#12c00e"><br>
                                                <br>
                                                <br>
                                              </span>*** Feedback Invite
                                              ***<br>
                                              <span style="color:#00afcd"><br>
                                              </span>If you follow and
                                              support this work you need
                                              to be aware of three
                                              things and I am eager to
                                              hear your feedback and
                                              thoughts about them: the
                                              overall goals of Global
                                              ISel, the goals of the
                                              prototype, and the impact
                                              of the prototype work on
                                              backend design. <br>
                                              <span style="color:#00afcd"><br>
                                              </span>In the section
                                              “Goals", I defined
                                              (repeated for people that
                                              saw the talk) the goals
                                              for the Global ISel
                                              design.<br>
                                              - Do you see anything
                                              missing?<br>
                                              - Do you see something
                                              that should not be there? <br>
                                              <span style="color:#00afcd"><br>
                                              </span>The prototype will
                                              answer critical design
                                              questions (see “Design
                                              Questions the Prototype
                                              Addresses at the End of
                                              M1" for examples) before
                                              the actual design of Gobal
                                              ISel is finalized, but it
                                              cannot cover everything.<br>
                                              Specifically we will <b>*not*</b> look
                                              into improving TableGen or
                                              reuse InstCombine (see “
                                              Proposed Approach” for the
                                              rational). Please let me
                                              know if you see any issue
                                              with that.<br>
                                              <span style="color:#00afcd"><br>
                                              </span>There is also basic
                                              ground work needed to
                                              prepare for Global ISel
                                              and I need to extend the
                                              core MachineInstr-level
                                              APIs as explained during
                                              the talk. For this, I
                                              prepared sketches of
                                              patches to illustrate them
                                              and describe the details
                                              in the “Implications”
                                              section below. Please have
                                              a look at the patches to
                                              have a better idea of the
                                              expected impact.<br>
                                              <span style="color:#00afcd"><br>
                                              </span>If there is
                                              anything else you want to
                                              discuss related to Global
                                              ISel feel free to reach
                                              me. In particular, several
                                              people expressed their
                                              interests during the LLVM
                                              Dev Meeting in
                                              contributing to the
                                              project. Let me know what
                                              is your area of interest,
                                              so that we can coordinate
                                              our efforts.<br>
                                              Anyhow, please add
                                              [GlobalISel] in the
                                              subject line to help
                                              categorizing the emails.<br>
                                              <span style="color:#00afcd"><br>
                                                <br>
                                                <br>
                                              </span>*** Goals ***<br>
                                              <span style="color:#12c00e"><br>
                                              </span>The high level
                                              goals of the new
                                              instruction selector are:<br>
                                              - Global instruction
                                              selector.<br>
                                              - Fast instruction
                                              selector.<br>
                                              - Shared code path for
                                              fast and good instruction
                                              selection.<br>
                                              - IR that represents ISA
                                              concepts better.<br>
                                              - More flexible
                                              instruction selector.<br>
                                              - Easier to
                                              maintain/understand
                                              framework, in particular
                                              legalization.<br>
                                              - Self contained machine
                                              representation, no back
                                              links to LLVM IR.<br>
                                              - No change to LLVM IR.<br>
                                              <span style="color:#5856d6"><br>
                                              </span>Note:  The goals
                                              are common to all targets.
                                              In particular, we do not
                                              intend to work on target
                                              specific feature for the
                                              prototype.<br>
                                              The bottom line is please
                                              make sure those goals are
                                              compatible with what you
                                              want to achieve for your
                                              target, even if your
                                              requirement does not get
                                              listed here.<br>
                                              <br>
                                              <span style="color:#12c00e"><br>
                                                <br>
                                              </span>*** Proposed
                                              Approach ***<br>
                                              <span style="color:#12c00e"><br>
                                              </span>In this section, I
                                              describe the approach I
                                              plan to pursue in the
                                              prototype and the roadmap
                                              to get there. The final
                                              design will flow out of
                                              it.<br>
                                              <span style="color:#12c00e"><br>
                                              </span>For this prototype,
                                              we purposely exclude any
                                              work to improve or use
                                              TableGen or InstCombine <b>[final].</b> We
                                              will keep in mind however,
                                              that some of the C++ code
                                              we write will be
                                              table-generated at some
                                              point.<br>
                                              The rational is that we do
                                              not want to lay down a new
                                              TableGen/InstCombine
                                              infrastructure before
                                              being able to work on the
                                              ISel framework itself.<br>
                                              <span style="color:#12c00e"><br>
                                              </span>The prototype
                                              vehicle will be <b>AArch64</b>.
                                              None of the changes for
                                              GlobalISel will negatively
                                              impact the existing ISel.<br>
                                              <span style="color:#12c00e"><br>
                                                <br>
                                              </span>** High Level
                                              Prototype Design **<br>
                                              <span style="color:#12c00e"><br>
                                              </span>As shown in the
                                              talk, the expected
                                              pipeline for the prototype
                                              is:<br>
                                              <b>LLVM IR </b>->
                                              IRTranslator -> <b>Generic (G)
                                                MachineInstr</b> ->
                                              Legalizer ->
                                              RegBankSelect -> Select
                                              -> <b>MachineInstr</b><br>
                                              <span style="color:#12c00e"><br>
                                              </span>Where:<br>
                                              - Terms in <b>bold</b> are
                                              intermediate
                                              representations.<br>
                                              -  Generic MachineInstrs
                                              are machine instructions
                                              with a generic opcode,
                                              e.g., ADD, COPY.</p>
                                          </div>
                                        </div>
                                      </div>
                                      <div>
                                        <div>
                                          <div><p class="MsoNormal">-
                                              IRTranslator: Translate
                                              LLVM IR to (G)
                                              MachineInstr.<br>
                                              - Legalizer: Legalize
                                              illegal (G) MachineInstr
                                              to legal (G) MachineInstr.<br>
                                              - RegBankSelect: Assign
                                              virtual register with size
                                              to virtual register with
                                              Register Bank.<br>
                                              - Select: Translate the
                                              remaining (G) MachineInstr
                                              to MachineIntr.<br>
                                              <br>
                                              <span style="color:#00afcd"><br>
                                                <br>
                                              </span>** Implications **<br>
                                              <span style="color:#00afcd"><br>
                                              </span>As part of the
                                              bring-up of the prototype,
                                              we need to extend some of
                                              the core
                                              MachineInstr-level APIs:<br>
                                                - Need to remember
                                              FastMath flags for each
                                              MachineInstr.<br>
                                                - Need to know the type
                                              of each MachineInstr. We
                                              don’t want ADD8, ADD16,
                                              etc.<br>
                                                - Extend the
                                              MachineRegisterInfo to
                                              support size as well as
                                              register classes for
                                              virtual registers.<br>
                                              <span style="color:#00afcd"><br>
                                              </span>I have sketched the
                                              changes in the attached
                                              patches to help picturing
                                              how the changes would
                                              impact the existing APIs.</p>
                                          </div>
                                        </div>
                                      </div>
                                      <div>
                                        <div>
                                          <div><div> <br></div>
                                          </div>
                                        </div>
                                      </div>
                                      <div>
                                        <div>
                                          <div><p class="MsoNormal">Note: I
                                              do not intend to commit
                                              those changes as they are.
                                              They will go the usual
                                              review process in due
                                              time.</p>
                                          </div>
                                        </div>
                                      </div>
                                      <div>
                                        <div>
                                          <div><p class="MsoNormal"><br>
                                              The patches contain “//
                                              ***”-like comment that
                                              give a rough explanation
                                              on why those changes are
                                              needed w.r.t. the goals.<br>
                                              The order of the patches
                                              could be modified since
                                              the dependencies between
                                              those are not sequential.
                                              Anyhow, here are the
                                              patches:<br>
                                              1. Introduce (some of) the
                                              generic opcode.<br>
                                              2. Make MachineFunction
                                              more independent of LLVM
                                              IR to eventually be able
                                              to delete the LLVM IR
                                              instance from the memory.<br>
                                              3. Extend MachineInstr to
                                              represent additional
                                              information attached to
                                              generic opcode.<br>
                                              4. Teach
                                              MachineRegisterInfo about
                                              size for virtual
                                              registers.<br>
                                              5. Introduce a helper
                                              class to build
                                              MachineInstr related
                                              objects.<br>
                                              6. Add new target hooks to
                                              lower the ABI directly to
                                              MachineInstr.<br>
                                              7. Introduce the
                                              IRTranslator pass.<br>
                                              <br>
                                              <span style="color:#12c00e"><br>
                                              </span>** Roadmap for the
                                              Prototype **<br>
                                              <span style="color:#00afcd"><br>
                                              </span>We plan to split
                                              the prototype in three
                                              main milestones:<br>
                                              1. Translation: LLVM IR to
                                              (G) MachineInstr
                                              translation.<br>
                                              2. Basic selector: Legal
                                              LLVM IR to target specific
                                              MachineInstr.<br>
                                              3. Simple legalization:
                                              Support scalar type
                                              legalization and some
                                              vector instructions.<br>
                                              <span style="color:#00afcd"><br>
                                              </span>Notes:<br>
                                              - For #1, we will not
                                              support any fancy
                                              instructions like landing
                                              pad or switch.<br>
                                              - Each milestone should
                                              take about 3-4 months.</p>
                                          </div>
                                        </div>
                                      </div>
                                      <div>
                                        <div>
                                          <div><p class="MsoNormal">- At
                                              the end of #2, we would
                                              have a FastISel like
                                              selector.<br>
                                              <span style="color:#00afcd"><br>
                                              </span>Each milestone will
                                              be detailed right before
                                              starting it. The rational
                                              is that we want to
                                              accommodate what we
                                              discovered with the
                                              prototype for the next
                                              milestone. In other words,
                                              in this email, <b>I only
                                                describe the first
                                                milestone</b> in detail
                                              and I will give more
                                              details on the next
                                              milestone shortly before
                                              we start it and so on. For
                                              your information, here is
                                              the remaining of the
                                              intended roadmap for the <b>full</b> project:<br>
                                              4. Productization: Clean
                                              up implementation,
                                              stabilize the APIs.<br>
                                              5. Complex legalization:
                                              Extend legalization
                                              support to everything
                                              missing.<br>
                                              6. Completeness: Fill the
                                              blanks, e.g., landing pad.<br>
                                              7. Clean-up and
                                              performance: Add the
                                              necessary bits to be at
                                              parity or beat
                                              SelectionDAG generated
                                              code.<br>
                                              8. Transition: Document
                                              how to switch, provide
                                              tools to help.<br>
                                              <span style="color:#00afcd"><br>
                                                <br>
                                              </span>** Milestone 1 **<br>
                                              <span style="color:#12c00e"><br>
                                              </span>The first phase is
                                              focused on the
                                              IRTranslator pass.<br>
                                              <span style="color:#12c00e"><br>
                                              </span>The IRTranslator is
                                              responsible for
                                              translating the LLVM IR
                                              into Generic MachineInstr.
                                              The IRTranslator pass uses
                                              some target hooks
                                              to perform the ABI
                                              lowering. We can either
                                              define a new API for them,
                                              e.g., ABILoweringInfo, or
                                              extend the existing
                                              TargetLowering.<br>
                                              Moreover, the prototype
                                              will focus on simple
                                              instruction, i.e., we will
                                              not support switch or
                                              landing pad for this
                                              iteration.<br>
                                              <span style="color:#12c00e"><br>
                                              </span>At the end of M1,
                                              the prototype will not be
                                              able to produce code,
                                              since we would only have
                                              the beginning of the
                                              Global ISel pipeline.
                                              Instead, we will test the
                                              IRTranslator on the
                                              generic output that is
                                              produced from the tested
                                              IR.<br>
                                              <span style="color:#12c00e"><br>
                                              </span>* Design Decisions
                                              *<br>
                                              <span style="color:#12c00e"><br>
                                              </span>- The IRTranslator
                                              is a final class. Its
                                              purpose is to move away
                                              from LLVM IR to
                                              MachineInstr world <b>[final]</b>.<br>
                                              - Lower the ABI as part of
                                              the translation process <b>[final]</b>.<br>
                                              <span style="color:#12c00e"><br>
                                              </span>* Design Questions
                                              the Prototype Addresses at
                                              the End of M1 *<br>
                                              <span style="color:#12c00e"><br>
                                              </span>- Handling of
                                              aggregate types during the
                                              translation.<br>
                                              - Lowering of switches.<br>
                                              - What about Module pass
                                              for Machine pass?<br>
                                              - Introduce new APIs to
                                              have a clearer separation
                                              between:<br>
                                                - Legalization
                                              (setOperationAction, etc.)<br>
                                                - Cost/Combine related
                                              (isXXXFree, etc.)<br>
                                                - Lowering related
                                              (LowerFormal, etc.)<br>
                                              - What is the contract
                                              with the backends? Is it
                                              still “should be able to
                                              select any valid LLVM IR”?<br>
                                              <span style="color:#00afcd"><br>
                                              </span>Thanks,</p>
                                          </div>
                                        </div>
                                        <div>
                                          <div>
                                            <div>
                                              <div>
                                                <div>
                                                  <div>
                                                    <div>
                                                      <div>
                                                        <div>
                                                          <div>
                                                          <div>
                                                          <div>
                                                          <div>
                                                          <div>
                                                          <div>
                                                          <div>
                                                          <div>
                                                          <div>
                                                          <div>
                                                          <div>
                                                          <div>
                                                          <div>
                                                          <div>
                                                          <div>
                                                          <div>
                                                          <div>
                                                          <div>
                                                          <div>
                                                          <div>
                                                          <div>
                                                          <div>
                                                          <div>
                                                          <div><p class="MsoNormal">-Quentin</p>
                                                          </div>
                                                          </div>
                                                          </div>
                                                          </div>
                                                          </div>
                                                          </div>
                                                          </div>
                                                          </div>
                                                          </div>
                                                          </div>
                                                          </div>
                                                          </div>
                                                          </div>
                                                          </div>
                                                          </div>
                                                          </div>
                                                          </div>
                                                          </div>
                                                          </div>
                                                          </div>
                                                          </div>
                                                          </div>
                                                          </div>
                                                          </div>
                                                        </div>
                                                      </div>
                                                    </div>
                                                  </div>
                                                </div>
                                              </div>
                                            </div>
                                          </div>
                                        </div>
                                      </div>
                                    </div>
                                  </div>
                                </div>
                              </blockquote>
                            </div>
                          </div>
                        </div>
                      </div>
                    </blockquote>
                  </div><div> <br></div>
                </div>
              </div>
            </div>
          </div>
          _______________________________________________<br>
          LLVM Developers mailing list<br>
          <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
          <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
        </blockquote>
      </div>
      <br>
      <fieldset></fieldset>
      <br>
      <pre>_______________________________________________
LLVM Developers mailing list
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>
</pre>
    </blockquote>
    <br>
  </div>

</div></blockquote></div></div></div></blockquote></div></div>