<div dir="ltr"><div>You can call DataLayout::isLegalInteger() to verify the type you're shrinking to is reasonable (InstCombine already does this, e.g. InstCombiner::shouldChangeType()).</div><div>This isn't quite the same thing as querying TLI, but...</div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Mar 7, 2017 at 9:36 AM, Friedman, Eli <span dir="ltr"><<a href="mailto:efriedma@codeaurora.org" target="_blank">efriedma@codeaurora.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF"><div><div class="gmail-h5">
    <div class="gmail-m_-3688459371949455432moz-cite-prefix">On 3/7/2017 9:15 AM, Wei Mi wrote:<br>
    </div>
    <blockquote type="cite">
      <div dir="ltr"><br>
        <div class="gmail_extra"><br>
          <div class="gmail_quote">On Mon, Mar 6, 2017 at 8:50 PM, Wei
            Mi <span dir="ltr"><<a href="mailto:wmi@google.com" target="_blank">wmi@google.com</a>></span>
            wrote:<br>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
              <div dir="ltr"><br>
                <div class="gmail_extra"><br>
                  <div class="gmail_quote"><span>On Mon, Mar 6,
                      2017 at 4:36 PM, Friedman, Eli <span dir="ltr"><<a href="mailto:efriedma@codeaurora.org" target="_blank">efriedma@codeaurora.org</a>></span>
                      wrote:<br>
                    </span><span>
                      <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="gmail-m_-3688459371949455432m_-8464126786741311821gmail-">On
                          3/6/2017 11:06 AM, Wei Mi wrote:<br>
                          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                            After looking at the benchmark motivating
                            the test, I find it is much harder for
                            codegenprepare to catch all the shrinking
                            opportunities compared with instcombine,
                            because of two things:<br>
                            <br>
                            * loadpre may merge the original load in a
                            shrinking pattern with other load in a very
                            early position, and there maybe other stores
                            in between. The safety check for the
                            codegenprepare now simply scan
                            mayWriteToMemory insns between load and
                            store in the candidate shrinking pattern,
                            and it will fail because of those stores in
                            between.<br>
                          </blockquote>
                          <br>
                        </span>
                        It would be easy to make the check a lot more
                        powerful using MemorySSA; I guess you don't have
                        access to that in the passes where you're doing
                        the transform?</blockquote>
                      <div><br>
                      </div>
                    </span>
                    <div>Yes, if MemorySSA + alias query can be used, it
                      should solve the problem. I didn't find existing
                      usage of MemorySSA in instcombine or CGP so I am
                      not sure whether it can be used. </div>
                    <span>
                      <div> </div>
                      <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="gmail-m_-3688459371949455432m_-8464126786741311821gmail-"><br>
                          <br>
                          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                            * we may see the following pattern. a.f1 is
                            a bitfield. Ideally, we should do the
                            shrinking for all the three stores. But
                            codegenprepare works in top-down order.
                            After the first store is shrinked, it is
                            hard for the second and the third to know
                            %bf.set is the same as the value loaded from
                            %0. If we do this in instcombine, it is much
                            easier because before load/store
                            elimination, %bf.set will be loaded from %0
                            respecitively in if.then9 and in if.end41.<br>
                          </blockquote>
                          <br>
                        </span>
                        I'm not following how it helps to do this in
                        instcombine?  Normally, EarlyCSE runs first.  (A
                        complete C testcase might be useful here.)</blockquote>
                      <div><br>
                      </div>
                    </span>
                    <div>It is not earlycse. It is also loadpre to cause
                      the load to be replaced by %bf.set. </div>
                    <div><br>
                    </div>
                    <div>
                      <div style="font-size:12.8px">Before global value
                        numbering:</div>
                      <span>
                        <div style="font-size:12.8px"><br class="gmail-m_-3688459371949455432m_-8464126786741311821gmail-Apple-interchange-newline">
                          entry:</div>
                        <div style="font-size:12.8px">  %0 = bitcast
                          %class.B* %this to i64*</div>
                        <div style="font-size:12.8px">  %bf.load = load
                          i64, i64* %0, align 8</div>
                        <div style="font-size:12.8px">
                          <div>  %dec = add i64 %bf.load, 65535</div>
                          <div>  %bf.value = and i64 %dec, 65535</div>
                          <div>  %bf.clear5 = and i64 %bf.load, -65536</div>
                          <div>  %bf.set = or i64 %bf.value, %bf.clear5</div>
                          <div>  store i64 %bf.set, i64* %0, align 8</div>
                        </div>
                        <div style="font-size:12.8px">  ...</div>
                        <div style="font-size:12.8px">if.then9:</div>
                      </span>
                      <div style="font-size:12.8px">  <span style="font-size:12.8px">%bf.load1 = load i64,
                          i64* %0, align 8                      //
                          %bf.load1 is helpful for the patch to
                          recognize the load-and-or-store pattern.</span></div>
                      <div style="font-size:12.8px">
                        <div>  %inc79 = add i64 %bf.load1,
                          281474976710656       </div>
                        <span>
                          <div>  %bf.shl = and i64 %inc79,
                            71776119061217280</div>
                        </span>
                        <div>  %bf.clear18 = and i64 %bf.load1,
                          -71776119061217281</div>
                        <span>
                          <div>  %bf.set19 = or i64 %bf.shl, %bf.clear18</div>
                          <div>  store i64 %bf.set19, i64* %0, align 8</div>
                        </span></div>
                      <div style="font-size:12.8px">  ...</div>
                      <div style="font-size:12.8px">if.end41:</div>
                      <div style="font-size:12.8px">  <span style="font-size:12.8px">%bf.load2 = load i64,
                          i64* %0, align 8                      </span><span style="font-size:12.8px">// %bf.load2 is
                          helpful for the patch to recognize the
                          load-and-or-store pattern.</span></div>
                      <div style="font-size:12.8px">
                        <div>  %inc4578 = add i64 %bf.load2, 65536      
                                          </div>
                        <span>
                          <div>  %bf.shl48 = and i64 %inc4578,
                            4294901760</div>
                        </span>
                        <div>  %bf.clear49 = and i64 %bf.load2,
                          -4294901761</div>
                        <span>
                          <div>  %bf.set50 = or i64 %bf.shl48,
                            %bf.clear49</div>
                          <div>  store i64 %bf.set50, i64* %0, align 8</div>
                        </span></div>
                    </div>
                    <div> </div>
                    <div>After global value numbering:</div>
                    <span>
                      <div><br>
                      </div>
                      <div>
                        <div style="font-size:12.8px">entry:</div>
                        <div style="font-size:12.8px">  %0 = bitcast
                          %class.B* %this to i64*</div>
                        <div style="font-size:12.8px">  %bf.load = load
                          i64, i64* %0, align 8</div>
                        <div style="font-size:12.8px">
                          <div>  %dec = add i64 %bf.load, 65535</div>
                          <div>  %bf.value = and i64 %dec, 65535</div>
                          <div>  %bf.clear5 = and i64 %bf.load, -65536</div>
                          <div>  %bf.set = or i64 %bf.value, %bf.clear5</div>
                          <div>  store i64 %bf.set, i64* %0, align 8</div>
                        </div>
                        <div style="font-size:12.8px">  ...</div>
                        <div style="font-size:12.8px">if.then9:</div>
                        <div style="font-size:12.8px">
                          <div>  %inc79 = add i64 %bf.set,
                            281474976710656       // we hope to know
                            %bf.set is the same as load i64, i64* %0,
                            align 8.</div>
                          <div>  %bf.shl = and i64 %inc79,
                            71776119061217280</div>
                          <div>  %bf.clear18 = and i64 %bf.set,
                            -71776119061217281</div>
                          <div>  %bf.set19 = or i64 %bf.shl, %bf.clear18</div>
                          <div>  store i64 %bf.set19, i64* %0, align 8</div>
                        </div>
                        <div style="font-size:12.8px">  ...</div>
                        <div style="font-size:12.8px">if.end41:</div>
                        <div style="font-size:12.8px">
                          <div>  %inc4578 = add i64 %bf.set, 65536      
                                             // we hope to know %bf.set
                            is the same as load i64, i64* %0, align 8.</div>
                          <div>  %bf.shl48 = and i64 %inc4578,
                            4294901760</div>
                          <div>  %bf.clear49 = and i64 %bf.set,
                            -4294901761</div>
                          <div>  %bf.set50 = or i64 %bf.shl48,
                            %bf.clear49</div>
                          <div>  store i64 %bf.set50, i64* %0, align 8</div>
                        </div>
                      </div>
                    </span></div>
                </div>
              </div>
            </blockquote>
          </div>
        </div>
      </div>
    </blockquote>
    <br></div></div>
    I don't like depending on pass ordering like this... I mean, I can't
    really say without the original C testcase, but at first glance,
    EarlyCSE should be catching this, and you're just getting lucky.<span class="gmail-"><br>
    <blockquote type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div> Or can I move all the types of shrinking to
              instcombine? Since I only do the shrinking to i8, i16,
              i32, I think almost all the general architectures support
              i8/i16/i32 load/store directly?</div>
          </div>
        </div>
      </div>
    </blockquote></span>
    It should be safe to assume every architecture has a reasonable
    lowering for i8, i16, and i32 load/store.<span class="gmail-HOEnZb"><font color="#888888"><br>
    <br>
    -Eli<br>
    <pre class="gmail-m_-3688459371949455432moz-signature" cols="72">-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project</pre>
  </font></span></div>

</blockquote></div><br></div></div>