<div dir="ltr"><div class="uyb8Gf"><div class="F3hlO"><div text="#000000" bgcolor="#FFFFFF">Hi Philip,</div><div text="#000000" bgcolor="#FFFFFF"><br></div><div text="#000000" bgcolor="#FFFFFF">> I think after reading your link I'm actually more confused.  This might just be a wording problem, but let me ask a couple of clarifying questions.<br><br>Sorry about that :( Every time I explain this I get slightly more embarassed because it is indeed weird and ugly (but was certainly the least ugly solution).<br><br>> 1) After compiling the code sequence below (from that page), does the in memory bit pattern differ?  The page seemed to contradict itself.  <br><pre>> %0 = load <4 x i32> %x

> %1 = bitcast <4 x i32> %0 to <2 x i64>

>      store <2 x i64> %1, <2 x i64>* %y

</pre>Yes. The memory pattern differs. This is the first diagram on the right at: <a href="http://llvm.org/docs/BigEndianNEON.html#bitconverts">http://llvm.org/docs/BigEndianNEON.html#bitconverts</a>)</div><div text="#000000" bgcolor="#FFFFFF"><br></div><div text="#000000" bgcolor="#FFFFFF">> If so, does this mean that performing dead-store-elimination is illegal for ARM?<br><br>Yes, for vector types whose corresponding load differs from the store type. </div><div text="#000000" bgcolor="#FFFFFF"><br></div><div text="#000000" bgcolor="#FFFFFF">%0 = load <4 x i32> %x</div><div text="#000000" bgcolor="#FFFFFF">store <4 x i32> %0, <4 x i32>* %x</div><div text="#000000" bgcolor="#FFFFFF"><br></div><div text="#000000" bgcolor="#FFFFFF">is still fine. I should go and check that DSE doesn't do bad things for big-endian NEON actually...</div><div text="#000000" bgcolor="#FFFFFF"><br>> 3) Are loads and stores ever allowed to fault based on the in memory representation?  <br><br>No (thank goodness!)</div><div text="#000000" bgcolor="#FFFFFF"><br>> 4) What happens if we have a load of <2xi64> following the store above and we do DSE the store before forwarding it's value?</div></div></div><div class="uyb8Gf"><br></div>The store can't be DSE'd as above. But value forwarding is fine. It's fine because the IR is strongly typed - there's no way to remove that bitcast and still have the IR correctly formed. However folding bitcasts into memory operands is explicitly illegal:<br><br><br>%1 = bitcast <4 x i32> %x to <2 x i64><br>store <2 x i64> %x to <2 x i64>* %y<br>  =><br>store <4 x i32> %x to (bitcast <2 x i64>* %x to < 4 x i32>*) ; ILLEGAL!<div><br></div><div>There's a hook somewhere in CGP that disables an optimization that tries to do this.</div><div><br></div><div>So in IR, because it's strongly typed, there's not really many special cases or things to worry about. But in SDAG things get more difficult. SDAG is weakly typed and all bitconverts will just get blasted into oblivion, so while SDAG can merge bitconverts (bitconvert (bitconvert %x)) -> (bitconvert %x), it mustn't remove them completely.</div><div><br></div><div>I hope I've explained that OK. CCing Tim who can hopefully pick more holes in the explanation.</div><div><br></div><div>Also, could you please point me to where the documentation seems contradictory? then I'll fix it. I wrote it for exactly this scenario!</div><div><br></div><div>Cheers,</div><div><br></div><div>James</div><br><div class="gmail_quote"><div dir="ltr">On Wed, 13 Jan 2016 at 00:42 Quentin Colombet <<a href="mailto:qcolombet@apple.com">qcolombet@apple.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div>Hi James,</div><div><br></div>I am also confused!<div><br></div><div><div></div></div></div><div style="word-wrap:break-word"><div><div><blockquote type="cite"><div>On Jan 12, 2016, at 4:11 PM, Philip Reames <<a href="mailto:listmail@philipreames.com" target="_blank">listmail@philipreames.com</a>> wrote:</div><br><div>

  <div text="#000000" bgcolor="#FFFFFF">

    I think after reading your link I'm actually more confused.  This

    might just be a wording problem, but let me ask a couple of

    clarifying questions.<br>

    <br>

    1) After compiling the code sequence below (from that page), does

    the in memory bit pattern differ?  The page seemed to contradict

    itself.  <br></div></div></blockquote><div><br></div></div></div></div><div style="word-wrap:break-word"><div><div>+1</div><div><br></div><div>Thanks,</div><div>Q.</div></div></div><div style="word-wrap:break-word"><div><div><br><blockquote type="cite"><div><div text="#000000" bgcolor="#FFFFFF">

    <pre>%0 = load <4 x i32> %x

%1 = bitcast <4 x i32> %0 to <2 x i64>

     store <2 x i64> %1, <2 x i64>* %y

</pre>

    2) If so, does this mean that performing dead-store-elimination is

    illegal for ARM?<br>

    <br>

    3) Are loads and stores ever allowed to fault based on the in memory

    representation?  <br>

    <br>

    4) What happens if we have a load of <2xi64> following the

    store above and we do DSE the store before forwarding it's value?<br>

    <br>

    Philip<br>

    <br>

    <br>

    <div>On 01/12/2016 05:55 AM, James Molloy

      via llvm-dev wrote:<br>

    </div>

    <blockquote type="cite">

      <div dir="ltr">Hi,

        <div><br>

        </div>

        <div>

          <div>

            <div>

              <div link="blue" vlink="purple" lang="EN-GB"><p class="MsoNormal"><span style="font-size:11pt;font-family:Calibri,sans-serif">>

                    I found this thinking quite difficult to explain.

                    Does it make sense?</span></p>

                <div><span style="font-size:11pt;font-family:Calibri,sans-serif">It

                    might help to link to the documentation on why

                    bitcasts are weird on big-endian NEON: </span><font face="Calibri, sans-serif"><span style="font-size:14.6667px;line-height:22px"><a href="http://llvm.org/docs/BigEndianNEON.html#bitconverts" target="_blank"></a><a href="http://llvm.org/docs/BigEndianNEON.html#bitconverts" target="_blank">http://llvm.org/docs/BigEndianNEON.html#bitconverts</a></span></font></div>

                <div><font face="Calibri, sans-serif"><span style="font-size:14.6667px;line-height:22px"><br>

                    </span></font></div>

                <div><font face="Calibri, sans-serif"><span style="font-size:14.6667px;line-height:22px">Cheers,</span></font></div>

                <div><font face="Calibri, sans-serif"><span style="font-size:14.6667px;line-height:22px"><br>

                    </span></font></div>

                <div><font face="Calibri, sans-serif"><span style="font-size:14.6667px;line-height:22px">James</span></font></div>

              </div>

            </div>

          </div>

        </div>

      </div>

      <br>

      <div class="gmail_quote">

        <div dir="ltr">On Tue, 12 Jan 2016 at 13:23 Daniel Sanders via

          llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>>

          wrote:<br>

        </div>

        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

          <div link="blue" vlink="purple" lang="EN-GB">

            <div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">Hi,</span></p><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">I

                  haven't found much time to look into the LLVM-IR-level

                  optimizations yet so I'm not sure how they handle

                  bitcasts. With that disclaimer in mind, I expect it's

                  fine for the LLVM-IR level optimizations to handle

                  them using either definition since they are equivalent

                  at the LLVM-IR level. My thinking is that LLVM-IR is

                  consistent about how virtual bits are assigned to

                  types and that non-zero instruction nops arise when

                  there is inconsistency.</span></p><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">At

                  the LLVM-IR level, bits 0-127 of <4 x i32> map

                  directly onto bits 0-127 of <2 x i64> using the

                  identity map. It's therefore ok to interpret such

                  bitcasts as zero-instruction no-ops. As far as I can

                  tell, LLVM-IR has been defined such that the identity

                  map can be used for bitcasts between all same-sized

                  types, and also such that bitcasting between

                  different-sized types is invalid.</span></p><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">Similarly,

                  most targets have a single mapping of virtual bit

                  numbers to physical bit numbers for each size that is

                  applied consistently when mapping a type to memory.

                  For example 32-bits map like so:</span></p><p class="MsoNormal" style="text-indent:36.0pt"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">Little

                  Endian Targets: virtual register bits

                  {0..7,8..15,16..23,24..31} map to physical memory bits

                  {0..7,8..15,16..23,24..31}</span></p><p class="MsoNormal" style="text-indent:36.0pt"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">Big

                  Endian Targets: virtual register bits

                  {0..7,8..15,16..23,24..31} map to physical memory bits

                  {24..31,16..23,8..15,0..7}</span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">regardless

                  of whether it's a float, or an i32. We therefore need

                  zero instructions to re-map physical memory bits for

                  one type onto another type.</span></p><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">The

                  same idea holds for physical register classes. There's

                  a single consistent mapping from physical memory bits

                  to physical register bits that applies for all types

                  that can be stored in that class. As long as this is

                  the case the load/store and zero-instruction

                  interpretation of bitcasts are equivalent.</span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">In

                  the case of big-endian MSA and NEON, there isn't a

                  single consistent mapping from physical memory bits to

                  physical register bits so the equivalence in the two

                  definitions breaks down:</span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">               

                  i128: virtual register bits {0..31, 32..63, 64..95,

                  96...127} map to physical memory bits {96..127,

                  64..95, 32..63, 0..31}</span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">               

                  <4 x i32>: virtual register bits {0..31, 32..63,

                  64..95, 96...127} map to physical memory bits {0..31,

                  32..63, 64..95, 96..127}</span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">               

                  <2 x i64>: virtual register bits {0..31, 32..63,

                  64..95, 96...127} map to physical memory bits {32..63,

                  0..31, 96..127, 64..95}</span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">with

                  these inconsistent mappings we require instructions to

                  bitcast between the types.</span></p><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">I

                  found this thinking quite difficult to explain. Does

                  it make sense?</span></p>

            </div>

          </div>

          <div link="blue" vlink="purple" lang="EN-GB">

            <div><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">>

                </span>I am fine with treating bit casts as equivalent

                store/load pairs in GISel, I just want to be sure we do

                not have a semantic gap between the LLVM-IR and the

                backend if we do.</p><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div>

            </div>

          </div>

          <div link="blue" vlink="purple" lang="EN-GB">

            <div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">I

                  think a gap would arise from not having a GISel

                  equivalent to ISD::BITCAST (gBITCAST?) available when

                  it's necessary for correctness. However, I agree that

                  GISel should delete bitcasts for the common case where

                  the store/load and zero-instruction definitions are

                  equivalent.</span></p><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div>

              <div style="border:none;border-left:solid blue 1.5pt;padding:0cm 0cm 0cm 4.0pt">

                <div>

                  <div style="border:none;border-top:solid #b5c4df 1.0pt;padding:3.0pt 0cm 0cm 0cm"><p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"" lang="EN-US">From:</span></b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"" lang="EN-US"> Quentin Colombet [mailto:<a href="mailto:qcolombet@apple.com" target="_blank"></a><a href="mailto:qcolombet@apple.com" target="_blank">qcolombet@apple.com</a>]

                        <br>

                        <b>Sent:</b> 11 January 2016 17:23<br>

                        <b>To:</b> Daniel Sanders<br>

                        <b>Cc:</b> Tim Northover (<a href="mailto:t.p.northover@gmail.com" target="_blank"></a><a href="mailto:t.p.northover@gmail.com" target="_blank">t.p.northover@gmail.com</a>);

                        llvm-dev</span></p>

                  </div>

                </div>

              </div>

            </div>

          </div>

          <div link="blue" vlink="purple" lang="EN-GB">

            <div>

              <div style="border:none;border-left:solid blue 1.5pt;padding:0cm 0cm 0cm 4.0pt">

                <div>

                  <div style="border:none;border-top:solid #b5c4df 1.0pt;padding:3.0pt 0cm 0cm 0cm"><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"" lang="EN-US"><br>

                        <b>Subject:</b> Re: [llvm-dev] [GlobalISel] A

                        Proposal for global instruction selection</span></p>

                  </div>

                </div>

              </div>

            </div>

          </div>

          <div link="blue" vlink="purple" lang="EN-GB">

            <div>

              <div style="border:none;border-left:solid blue 1.5pt;padding:0cm 0cm 0cm 4.0pt"><div> <br></div><p class="MsoNormal">Hi Daniel,</p>

                <div><div> <br></div>

                </div>

                <div><p class="MsoNormal">Thanks for the pointers, I wasn’t

                    aware of the second thread you’ve mentioned.</p>

                </div>

                <div><div> <br></div>

                </div>

                <div><p class="MsoNormal">I may be wrong but I think

                    LLVM-IR optimizations really treat bistcasts as

                    no-op casts, in the sense of no instructions are

                    required.</p>

                </div>

                <div><div> <br></div>

                </div>

                <div><p class="MsoNormal">Is there anyone that could chime

                    in on that?</p>

                </div>

                <div><div> <br></div>

                </div>

                <div><p class="MsoNormal">However, it seems SelectionDAG

                    sticks to the load/store semantic:</p>

                </div>

                <div><p class="MsoNormal"><span>"BITCAST

                      - This operator converts between integer, vector

                      and FP values, as if the value was

                      <b>stored to memory with one type and loaded from

                        the same address with the other type</b> (or

                      equivalently for vector format conversions, etc)."</span></p>

                </div>

                <div><div> <br></div>

                </div>

                <div><p class="MsoNormal">I am fine with treating bit casts

                    as equivalent store/load pairs in GISel, I just want

                    to be sure we do not have a semantic gap between the

                    LLVM-IR and the backend if we do.</p>

                </div>

                <div><div> <br></div>

                </div>

                <div><p class="MsoNormal">Thanks,</p>

                </div>

                <div><p class="MsoNormal">-Quentin</p>

                </div>

                <div><div> <br></div>

                  <div>

                    <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">

                      <div><p class="MsoNormal">On Jan 11, 2016, at 7:43

                          AM, Daniel Sanders <<a href="mailto:Daniel.Sanders@imgtec.com" target="_blank"></a><a href="mailto:Daniel.Sanders@imgtec.com" target="_blank">Daniel.Sanders@imgtec.com</a>>

                          wrote:</p>

                      </div><div> <br></div>

                      <div>

                        <div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">Hi,</span></p>

                        </div>

                        <div><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div>

                        </div>

                        <div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">It

                              was a comment by Tim that first made me

                              aware of it (see<span> </span><a href="http://lists.llvm.org/pipermail/llvm-dev/2013-August/064714.html" target="_blank"><span style="color:purple"></span></a><a href="http://lists.llvm.org/pipermail/llvm-dev/2013-August/064714.html" target="_blank">http://lists.llvm.org/pipermail/llvm-dev/2013-August/064714.html</a></span><span> </span>but

                              I think he commented on one of my patches

                              before that).</p>

                        </div>

                        <div><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div>

                        </div>

                        <div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">I

                              asked about it on llvm-dev a couple weeks

                              later (<a href="http://lists.llvm.org/pipermail/llvm-dev/2013-August/064919.html" target="_blank"><span style="color:purple">http://lists.llvm.org/pipermail/llvm-dev/2013-August/064919.html</span></a>)

                              highlighting the contradiction and was

                              told that 'no-op cast' referred to the

                              lack of math rather than a requirement

                              that zero instructions are used. It's

                              therefore my understanding that shuffling

                              the bits to preserve the load/store based

                              definition isn't considered to be changing

                              the bits.</span></p>

                        </div>

                        <div><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div>

                        </div>

                        <div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">I

                              think the main thing the current

                              definition is unclear on is whether it

                              refers to the bits in a physical machine

                              register or the bits in the LLVM-IR

                              virtual register. Most of the time these

                              two views are the same but this doesn't

                              quite work for big-endian MSA/NEON. For

                              example:</span></p>

                        </div>

                        <div><p class="MsoNormal" style="text-indent:36.0pt"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">%0

                              = bitcast <4 x i32> <i32 1, i32

                              2, i32 3, i32 4> to <2 x i64></span></p>

                        </div>

                        <div><p class="MsoNormal" style="text-indent:36.0pt"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">%0

                              = <2 x i64> <i64 (1 << 32)

                              | 2, i64 (3 << 32) | 4></span></p>

                        </div>

                        <div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">are

                              equivalent to each other in LLVM-IR terms

                              but the constants are physically laid out

                              in MSA registers as:</span></p>

                        </div>

                        <div><p class="MsoNormal" style="text-indent:36.0pt"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">0x00000004000000030000000200000001

                              # <4 x i32> <i32 1, i32 2, i32 3,

                              i32 4></span></p>

                        </div>

                        <div><p class="MsoNormal" style="text-indent:36.0pt"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">0x00000003000000040000000100000002

                              # <2 x i64> <i64 (1 << 32)

                              | 2, i64 (3 << 32) | 4></span></p>

                        </div>

                        <div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">and

                              we must therefore shuffle the bits to

                              preserve LLVM-IR's point of view.</span></p>

                        </div>

                        <div><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div>

                        </div>

                        <div style="border:none;border-left:solid blue 1.5pt;padding:0cm 0cm 0cm 4.0pt">

                          <div>

                            <div style="border:none;border-top:solid #b5c4df 1.0pt;padding:3.0pt 0cm 0cm 0cm">

                              <div><p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"" lang="EN-US">From:</span></b><span><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"" lang="EN-US"> </span></span><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"" lang="EN-US">Quentin Colombet [<a href="mailto:qcolombet@apple.com" target="_blank"></a><a href="mailto:qcolombet@apple.com" target="_blank">mailto:qcolombet@apple.com</a>]<span> </span><br>

                                    <b>Sent:</b><span> </span>07 January

                                    2016 19:58<br>

                                    <b>To:</b><span> </span>Daniel

                                    Sanders<br>

                                    <b>Cc:</b><span> </span>llvm-dev<br>

                                    <b>Subject:</b><span> </span>Re:

                                    [llvm-dev] [GlobalISel] A Proposal

                                    for global instruction selection</span></p>

                              </div>

                            </div>

                          </div>

                          <div><div> <br></div>

                          </div>

                          <div><p class="MsoNormal">Hi Daniel,</p>

                          </div>

                          <div>

                            <div><div> <br></div>

                            </div>

                          </div>

                          <div>

                            <div><p class="MsoNormal">I had a quick look at

                                the language reference for bitcast and I

                                have a different reading than what you

                                were pointing out.</p>

                            </div>

                          </div>

                          <div>

                            <div><p class="MsoNormal">Indeed, my take away

                                is:</p>

                            </div>

                          </div>

                          <div>

                            <div><p class="MsoNormal"><span>"It

                                  is<span> </span><b>always a </b></span><em><b><span>no-op

                                      cast</span></b></em><span> because

                                  no bits change with this conversion."</span></p>

                            </div>

                          </div>

                          <div>

                            <div><div> <br></div>

                            </div>

                          </div>

                          <div>

                            <div><p class="MsoNormal">In other words,

                                deleting all bitcast instructions should

                                be fine.</p>

                            </div>

                          </div>

                          <div>

                            <div><div> <br></div>

                            </div>

                          </div>

                          <div>

                            <div><p class="MsoNormal">My understanding of

                                the quote you’ve highlighted is that it

                                tells C programmers that this is like a

                                memcpy, not a cast :).</p>

                            </div>

                          </div>

                          <div>

                            <div><div> <br></div>

                            </div>

                          </div>

                          <div>

                            <div><p class="MsoNormal">Cheers,</p>

                            </div>

                          </div>

                          <div>

                            <div><p class="MsoNormal">-Quentin</p>

                            </div>

                            <div>

                              <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">

                                <div>

                                  <div><p class="MsoNormal">On Nov 20,

                                      2015, at 6:53 AM, Daniel Sanders

                                      <<a href="mailto:Daniel.Sanders@imgtec.com" target="_blank"><span style="color:purple">Daniel.Sanders@imgtec.com</span></a>>

                                      wrote:</p>

                                  </div>

                                </div>

                                <div><div> <br></div>

                                </div>

                                <div>

                                  <div>

                                    <div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">Hi,</span></p>

                                    </div>

                                  </div>

                                  <div>

                                    <div><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div>

                                    </div>

                                  </div>

                                  <div>

                                    <div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">I

                                          haven't had chance to read all

                                          of this yet, but one minor

                                          thing occurred to me during

                                          your presentation that I want

                                          to mention. At one point you

                                          mentioned deleting all the

                                          bitcast instructions since

                                          they're equivalent to nops but

                                          this isn't always true.</span></p>

                                    </div>

                                  </div>

                                  <div>

                                    <div><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div>

                                    </div>

                                  </div>

                                  <div>

                                    <div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">The<span> </span><a href="http://llvm.org/docs/LangRef.html" target="_blank"><span style="color:purple"></span></a><a href="http://llvm.org/docs/LangRef.html" target="_blank">http://llvm.org/docs/LangRef.html</a></span><span> </span>definition

                                          of the bitcast instruction

                                          includes this sentence:</p>

                                    </div>

                                  </div>

                                  <div>

                                    <div><p class="MsoNormal" style="text-indent:36.0pt"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">The

                                          conversion is done as if the

                                          value had been stored to

                                          memory and read back as type

                                          ty2.</span></p>

                                    </div>

                                  </div>

                                  <div>

                                    <div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">For

                                          big-endian MSA, this is

                                          equivalent to a shuffling of

                                          the bits in the register

                                          because endianness only

                                          changes the byte order within

                                          each element. The order of the

                                          elements is unaffected by

                                          endianness. IIRC, big-endian

                                          NEON is the same way.</span></p>

                                    </div>

                                  </div>

                                  <div>

                                    <div><div><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> </span><br></div>

                                    </div>

                                  </div>

                                  <div style="border:none;border-left:solid blue 1.5pt;padding:0cm 0cm 0cm 4.0pt">

                                    <div>

                                      <div style="border:none;border-top:solid #b5c4df 1.0pt;padding:3.0pt 0cm 0cm 0cm">

                                        <div>

                                          <div><p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"" lang="EN-US">From:</span></b><span><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"" lang="EN-US"> </span></span><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"" lang="EN-US">llvm-dev [<a href="mailto:llvm-dev-bounces@lists.llvm.org" target="_blank"><span style="color:purple"></span></a><a href="mailto:llvm-dev-bounces@lists.llvm.org" target="_blank">mailto:llvm-dev-bounces@lists.llvm.org</a></span>]<span> </span><b>On

                                                  Behalf Of<span> </span></b>Quentin

                                                Colombet via llvm-dev<br>

                                                <b>Sent:</b><span> </span>18

                                                November 2015 19:27<br>

                                                <b>To:</b><span> </span>llvm-dev<br>

                                                <b>Subject:</b><span> </span>[llvm-dev]

                                                [GlobalISel] A Proposal

                                                for global instruction

                                                selection</p>

                                          </div>

                                        </div>

                                      </div>

                                    </div>

                                    <div>

                                      <div><div> <br></div>

                                      </div>

                                    </div>

                                    <div>

                                      <div>

                                        <div>

                                          <div><p class="MsoNormal">Hi,<br>

                                              <span style="color:#12c00e"><br>

                                              </span>With this email, I

                                              would like to kick-off the

                                              development for the next

                                              instruction selector that

                                              I described during the

                                              last LLVM Dev’ Meeting.<br>

                                              For the motivations, see

                                              Jakob’s proposal (<a href="http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-August/064727.html" target="_blank"><span style="color:purple"></span></a><a href="http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-August/064727.html" target="_blank">http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-August/064727.html</a>)

                                              and for the proposal, see

                                              the slides (Keynote: <a href="http://llvm.org/viewvc/llvm-project/www/trunk/devmtg/2015-10/slides/Colombet-GlobalInstructionSelection.key?view=co" target="_blank"><span style="color:purple"></span></a><a href="http://llvm.org/viewvc/llvm-project/www/trunk/devmtg/2015-10/slides/Colombet-GlobalInstructionSelection.key?view=co" target="_blank">http://llvm.org/viewvc/llvm-project/www/trunk/devmtg/2015-10/slides/Colombet-GlobalInstructionSelection.key?view=co</a> or

                                              PDF: <a href="http://llvm.org/viewvc/llvm-project/www/trunk/devmtg/2015-10/slides/Colombet-GlobalInstructionSelection.pdf?revision=252430&view=co" target="_blank"><span style="color:purple"></span></a><a href="http://llvm.org/viewvc/llvm-project/www/trunk/devmtg/2015-10/slides/Colombet-GlobalInstructionSelection.pdf?revision=252430&view=co" target="_blank">http://llvm.org/viewvc/llvm-project/www/trunk/devmtg/2015-10/slides/Colombet-GlobalInstructionSelection.pdf?revision=252430&view=co</a>)

                                              or the talk (<a href="https://www.youtube.com/watch?v=F6GGbYtae3g&list=PL_R5A0lGi1AA4Lv2bBFSwhgDaHvvpVU21&index=2" target="_blank"><span style="color:purple"></span></a><a href="https://www.youtube.com/watch?v=F6GGbYtae3g&list=PL_R5A0lGi1AA4Lv2bBFSwhgDaHvvpVU21&index=2" target="_blank">https://www.youtube.com/watch?v=F6GGbYtae3g&list=PL_R5A0lGi1AA4Lv2bBFSwhgDaHvvpVU21&index=2</a>).</p>

                                          </div>

                                        </div>

                                      </div>

                                      <div>

                                        <div>

                                          <div><p class="MsoNormal"><br>

                                              TL;DR This is happening

                                              now, feedbacks invited!<br>

                                              <br>

                                              *** Context ***<br>

                                              <span style="color:#12c00e"><br>

                                              </span>During the last

                                              LLVM Dev’ Meeting, I have

                                              presented a proposal for

                                              the next instruction

                                              selector, GlobalISel. The

                                              proposal is basically

                                              summarized in "High Level

                                              Prototype Design” and

                                              “Roadmap”. (If you want

                                              further details, feel free

                                              to reach me.)<br>

                                              <span style="color:#00afcd"><br>

                                              </span>The first step of

                                              the development plan is to

                                              prototype the new

                                              framework on open source.

                                              The idea is to <b>start

                                                prototyping now(!)</b> and

                                              have the discussion

                                              ongoing in parallel. The

                                              reason of such approach is

                                              to have code that can be

                                              used to inform those

                                              discussions, e.g., by

                                              collecting data and trying

                                              different designs

                                              approaches. Regarding the

                                              discussion, I have listed

                                              a few points where your

                                              feedbacks would be

                                              particularly appreciated

                                              (see Feedback Invite).</p>

                                          </div>

                                        </div>

                                      </div>

                                      <div>

                                        <div>

                                          <div><p class="MsoNormal"><span style="color:#00afcd"><br>

                                              </span>Also, as I have

                                              mentioned in my talk, some

                                              issues are controversial

                                              but I expect them to be

                                              resolved during prototype

                                              development. Specifically

                                              theses concern aspects of

                                              legalization (should parts

                                              of it be done at the LLVM

                                              IR level or all at the MI

                                              level?) and code re-use

                                              for instruction combiner.

                                              Please feel free to bring

                                              up your specific concern

                                              as I move along with the

                                              development plan.<br>

                                              <span style="color:#00afcd"><br>

                                              </span>I expect the design

                                              to evolve with our

                                              experimental findings and

                                              your feedbacks

                                              and contributions.<br>

                                              Nonetheless, we expect to

                                              nail down some design

                                              decisions once and for all

                                              as the prototype

                                              progresses. I have

                                              highlighted them with

                                              the following pattern <b>[final]</b>.<br>

                                              <span style="color:#12c00e"><br>

                                                <br>

                                                <br>

                                              </span>*** Feedback Invite

                                              ***<br>

                                              <span style="color:#00afcd"><br>

                                              </span>If you follow and

                                              support this work you need

                                              to be aware of three

                                              things and I am eager to

                                              hear your feedback and

                                              thoughts about them: the

                                              overall goals of Global

                                              ISel, the goals of the

                                              prototype, and the impact

                                              of the prototype work on

                                              backend design. <br>

                                              <span style="color:#00afcd"><br>

                                              </span>In the section

                                              “Goals", I defined

                                              (repeated for people that

                                              saw the talk) the goals

                                              for the Global ISel

                                              design.<br>

                                              - Do you see anything

                                              missing?<br>

                                              - Do you see something

                                              that should not be there? <br>

                                              <span style="color:#00afcd"><br>

                                              </span>The prototype will

                                              answer critical design

                                              questions (see “Design

                                              Questions the Prototype

                                              Addresses at the End of

                                              M1" for examples) before

                                              the actual design of Gobal

                                              ISel is finalized, but it

                                              cannot cover everything.<br>

                                              Specifically we will <b>*not*</b> look

                                              into improving TableGen or

                                              reuse InstCombine (see “

                                              Proposed Approach” for the

                                              rational). Please let me

                                              know if you see any issue

                                              with that.<br>

                                              <span style="color:#00afcd"><br>

                                              </span>There is also basic

                                              ground work needed to

                                              prepare for Global ISel

                                              and I need to extend the

                                              core MachineInstr-level

                                              APIs as explained during

                                              the talk. For this, I

                                              prepared sketches of

                                              patches to illustrate them

                                              and describe the details

                                              in the “Implications”

                                              section below. Please have

                                              a look at the patches to

                                              have a better idea of the

                                              expected impact.<br>

                                              <span style="color:#00afcd"><br>

                                              </span>If there is

                                              anything else you want to

                                              discuss related to Global

                                              ISel feel free to reach

                                              me. In particular, several

                                              people expressed their

                                              interests during the LLVM

                                              Dev Meeting in

                                              contributing to the

                                              project. Let me know what

                                              is your area of interest,

                                              so that we can coordinate

                                              our efforts.<br>

                                              Anyhow, please add

                                              [GlobalISel] in the

                                              subject line to help

                                              categorizing the emails.<br>

                                              <span style="color:#00afcd"><br>

                                                <br>

                                                <br>

                                              </span>*** Goals ***<br>

                                              <span style="color:#12c00e"><br>

                                              </span>The high level

                                              goals of the new

                                              instruction selector are:<br>

                                              - Global instruction

                                              selector.<br>

                                              - Fast instruction

                                              selector.<br>

                                              - Shared code path for

                                              fast and good instruction

                                              selection.<br>

                                              - IR that represents ISA

                                              concepts better.<br>

                                              - More flexible

                                              instruction selector.<br>

                                              - Easier to

                                              maintain/understand

                                              framework, in particular

                                              legalization.<br>

                                              - Self contained machine

                                              representation, no back

                                              links to LLVM IR.<br>

                                              - No change to LLVM IR.<br>

                                              <span style="color:#5856d6"><br>

                                              </span>Note:  The goals

                                              are common to all targets.

                                              In particular, we do not

                                              intend to work on target

                                              specific feature for the

                                              prototype.<br>

                                              The bottom line is please

                                              make sure those goals are

                                              compatible with what you

                                              want to achieve for your

                                              target, even if your

                                              requirement does not get

                                              listed here.<br>

                                              <br>

                                              <span style="color:#12c00e"><br>

                                                <br>

                                              </span>*** Proposed

                                              Approach ***<br>

                                              <span style="color:#12c00e"><br>

                                              </span>In this section, I

                                              describe the approach I

                                              plan to pursue in the

                                              prototype and the roadmap

                                              to get there. The final

                                              design will flow out of

                                              it.<br>

                                              <span style="color:#12c00e"><br>

                                              </span>For this prototype,

                                              we purposely exclude any

                                              work to improve or use

                                              TableGen or InstCombine <b>[final].</b> We

                                              will keep in mind however,

                                              that some of the C++ code

                                              we write will be

                                              table-generated at some

                                              point.<br>

                                              The rational is that we do

                                              not want to lay down a new

                                              TableGen/InstCombine

                                              infrastructure before

                                              being able to work on the

                                              ISel framework itself.<br>

                                              <span style="color:#12c00e"><br>

                                              </span>The prototype

                                              vehicle will be <b>AArch64</b>.

                                              None of the changes for

                                              GlobalISel will negatively

                                              impact the existing ISel.<br>

                                              <span style="color:#12c00e"><br>

                                                <br>

                                              </span>** High Level

                                              Prototype Design **<br>

                                              <span style="color:#12c00e"><br>

                                              </span>As shown in the

                                              talk, the expected

                                              pipeline for the prototype

                                              is:<br>

                                              <b>LLVM IR </b>->

                                              IRTranslator -> <b>Generic (G)

                                                MachineInstr</b> ->

                                              Legalizer ->

                                              RegBankSelect -> Select

                                              -> <b>MachineInstr</b><br>

                                              <span style="color:#12c00e"><br>

                                              </span>Where:<br>

                                              - Terms in <b>bold</b> are

                                              intermediate

                                              representations.<br>

                                              -  Generic MachineInstrs

                                              are machine instructions

                                              with a generic opcode,

                                              e.g., ADD, COPY.</p>

                                          </div>

                                        </div>

                                      </div>

                                      <div>

                                        <div>

                                          <div><p class="MsoNormal">-

                                              IRTranslator: Translate

                                              LLVM IR to (G)

                                              MachineInstr.<br>

                                              - Legalizer: Legalize

                                              illegal (G) MachineInstr

                                              to legal (G) MachineInstr.<br>

                                              - RegBankSelect: Assign

                                              virtual register with size

                                              to virtual register with

                                              Register Bank.<br>

                                              - Select: Translate the

                                              remaining (G) MachineInstr

                                              to MachineIntr.<br>

                                              <br>

                                              <span style="color:#00afcd"><br>

                                                <br>

                                              </span>** Implications **<br>

                                              <span style="color:#00afcd"><br>

                                              </span>As part of the

                                              bring-up of the prototype,

                                              we need to extend some of

                                              the core

                                              MachineInstr-level APIs:<br>

                                                - Need to remember

                                              FastMath flags for each

                                              MachineInstr.<br>

                                                - Need to know the type

                                              of each MachineInstr. We

                                              don’t want ADD8, ADD16,

                                              etc.<br>

                                                - Extend the

                                              MachineRegisterInfo to

                                              support size as well as

                                              register classes for

                                              virtual registers.<br>

                                              <span style="color:#00afcd"><br>

                                              </span>I have sketched the

                                              changes in the attached

                                              patches to help picturing

                                              how the changes would

                                              impact the existing APIs.</p>

                                          </div>

                                        </div>

                                      </div>

                                      <div>

                                        <div>

                                          <div><div> <br></div>

                                          </div>

                                        </div>

                                      </div>

                                      <div>

                                        <div>

                                          <div><p class="MsoNormal">Note: I

                                              do not intend to commit

                                              those changes as they are.

                                              They will go the usual

                                              review process in due

                                              time.</p>

                                          </div>

                                        </div>

                                      </div>

                                      <div>

                                        <div>

                                          <div><p class="MsoNormal"><br>

                                              The patches contain “//

                                              ***”-like comment that

                                              give a rough explanation

                                              on why those changes are

                                              needed w.r.t. the goals.<br>

                                              The order of the patches

                                              could be modified since

                                              the dependencies between

                                              those are not sequential.

                                              Anyhow, here are the

                                              patches:<br>

                                              1. Introduce (some of) the

                                              generic opcode.<br>

                                              2. Make MachineFunction

                                              more independent of LLVM

                                              IR to eventually be able

                                              to delete the LLVM IR

                                              instance from the memory.<br>

                                              3. Extend MachineInstr to

                                              represent additional

                                              information attached to

                                              generic opcode.<br>

                                              4. Teach

                                              MachineRegisterInfo about

                                              size for virtual

                                              registers.<br>

                                              5. Introduce a helper

                                              class to build

                                              MachineInstr related

                                              objects.<br>

                                              6. Add new target hooks to

                                              lower the ABI directly to

                                              MachineInstr.<br>

                                              7. Introduce the

                                              IRTranslator pass.<br>

                                              <br>

                                              <span style="color:#12c00e"><br>

                                              </span>** Roadmap for the

                                              Prototype **<br>

                                              <span style="color:#00afcd"><br>

                                              </span>We plan to split

                                              the prototype in three

                                              main milestones:<br>

                                              1. Translation: LLVM IR to

                                              (G) MachineInstr

                                              translation.<br>

                                              2. Basic selector: Legal

                                              LLVM IR to target specific

                                              MachineInstr.<br>

                                              3. Simple legalization:

                                              Support scalar type

                                              legalization and some

                                              vector instructions.<br>

                                              <span style="color:#00afcd"><br>

                                              </span>Notes:<br>

                                              - For #1, we will not

                                              support any fancy

                                              instructions like landing

                                              pad or switch.<br>

                                              - Each milestone should

                                              take about 3-4 months.</p>

                                          </div>

                                        </div>

                                      </div>

                                      <div>

                                        <div>

                                          <div><p class="MsoNormal">- At

                                              the end of #2, we would

                                              have a FastISel like

                                              selector.<br>

                                              <span style="color:#00afcd"><br>

                                              </span>Each milestone will

                                              be detailed right before

                                              starting it. The rational

                                              is that we want to

                                              accommodate what we

                                              discovered with the

                                              prototype for the next

                                              milestone. In other words,

                                              in this email, <b>I only

                                                describe the first

                                                milestone</b> in detail

                                              and I will give more

                                              details on the next

                                              milestone shortly before

                                              we start it and so on. For

                                              your information, here is

                                              the remaining of the

                                              intended roadmap for the <b>full</b> project:<br>

                                              4. Productization: Clean

                                              up implementation,

                                              stabilize the APIs.<br>

                                              5. Complex legalization:

                                              Extend legalization

                                              support to everything

                                              missing.<br>

                                              6. Completeness: Fill the

                                              blanks, e.g., landing pad.<br>

                                              7. Clean-up and

                                              performance: Add the

                                              necessary bits to be at

                                              parity or beat

                                              SelectionDAG generated

                                              code.<br>

                                              8. Transition: Document

                                              how to switch, provide

                                              tools to help.<br>

                                              <span style="color:#00afcd"><br>

                                                <br>

                                              </span>** Milestone 1 **<br>

                                              <span style="color:#12c00e"><br>

                                              </span>The first phase is

                                              focused on the

                                              IRTranslator pass.<br>

                                              <span style="color:#12c00e"><br>

                                              </span>The IRTranslator is

                                              responsible for

                                              translating the LLVM IR

                                              into Generic MachineInstr.

                                              The IRTranslator pass uses

                                              some target hooks

                                              to perform the ABI

                                              lowering. We can either

                                              define a new API for them,

                                              e.g., ABILoweringInfo, or

                                              extend the existing

                                              TargetLowering.<br>

                                              Moreover, the prototype

                                              will focus on simple

                                              instruction, i.e., we will

                                              not support switch or

                                              landing pad for this

                                              iteration.<br>

                                              <span style="color:#12c00e"><br>

                                              </span>At the end of M1,

                                              the prototype will not be

                                              able to produce code,

                                              since we would only have

                                              the beginning of the

                                              Global ISel pipeline.

                                              Instead, we will test the

                                              IRTranslator on the

                                              generic output that is

                                              produced from the tested

                                              IR.<br>

                                              <span style="color:#12c00e"><br>

                                              </span>* Design Decisions

                                              *<br>

                                              <span style="color:#12c00e"><br>

                                              </span>- The IRTranslator

                                              is a final class. Its

                                              purpose is to move away

                                              from LLVM IR to

                                              MachineInstr world <b>[final]</b>.<br>

                                              - Lower the ABI as part of

                                              the translation process <b>[final]</b>.<br>

                                              <span style="color:#12c00e"><br>

                                              </span>* Design Questions

                                              the Prototype Addresses at

                                              the End of M1 *<br>

                                              <span style="color:#12c00e"><br>

                                              </span>- Handling of

                                              aggregate types during the

                                              translation.<br>

                                              - Lowering of switches.<br>

                                              - What about Module pass

                                              for Machine pass?<br>

                                              - Introduce new APIs to

                                              have a clearer separation

                                              between:<br>

                                                - Legalization

                                              (setOperationAction, etc.)<br>

                                                - Cost/Combine related

                                              (isXXXFree, etc.)<br>

                                                - Lowering related

                                              (LowerFormal, etc.)<br>

                                              - What is the contract

                                              with the backends? Is it

                                              still “should be able to

                                              select any valid LLVM IR”?<br>

                                              <span style="color:#00afcd"><br>

                                              </span>Thanks,</p>

                                          </div>

                                        </div>

                                        <div>

                                          <div>

                                            <div>

                                              <div>

                                                <div>

                                                  <div>

                                                    <div>

                                                      <div>

                                                        <div>

                                                          <div>

                                                          <div>

                                                          <div>

                                                          <div>

                                                          <div>

                                                          <div>

                                                          <div>

                                                          <div>

                                                          <div>

                                                          <div>

                                                          <div>

                                                          <div>

                                                          <div>

                                                          <div>

                                                          <div>

                                                          <div>

                                                          <div>

                                                          <div>

                                                          <div>

                                                          <div>

                                                          <div>

                                                          <div>

                                                          <div>

                                                          <div><p class="MsoNormal">-Quentin</p>

                                                          </div>

                                                          </div>

                                                          </div>

                                                          </div>

                                                          </div>

                                                          </div>

                                                          </div>

                                                          </div>

                                                          </div>

                                                          </div>

                                                          </div>

                                                          </div>

                                                          </div>

                                                          </div>

                                                          </div>

                                                          </div>

                                                          </div>

                                                          </div>

                                                          </div>

                                                          </div>

                                                          </div>

                                                          </div>

                                                          </div>

                                                          </div>

                                                        </div>

                                                      </div>

                                                    </div>

                                                  </div>

                                                </div>

                                              </div>

                                            </div>

                                          </div>

                                        </div>

                                      </div>

                                    </div>

                                  </div>

                                </div>

                              </blockquote>

                            </div>

                          </div>

                        </div>

                      </div>

                    </blockquote>

                  </div><div> <br></div>

                </div>

              </div>

            </div>

          </div>

          _______________________________________________<br>

          LLVM Developers mailing list<br>

          <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

          <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

        </blockquote>

      </div>

      <br>

      <fieldset></fieldset>

      <br>

      <pre>_______________________________________________

LLVM Developers mailing list

<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>

</pre>

    </blockquote>

    <br>

  </div>

</div></blockquote></div></div></div></blockquote></div></div>