<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Mar 6, 2017 at 8:50 PM, Wei Mi <span dir="ltr"><<a href="mailto:wmi@google.com" target="_blank">wmi@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><span class="">On Mon, Mar 6, 2017 at 4:36 PM, Friedman, Eli <span dir="ltr"><<a href="mailto:efriedma@codeaurora.org" target="_blank">efriedma@codeaurora.org</a>></span> wrote:<br></span><span class=""><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="m_-8464126786741311821gmail-">On 3/6/2017 11:06 AM, Wei Mi wrote:<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

After looking at the benchmark motivating the test, I find it is much harder for codegenprepare to catch all the shrinking opportunities compared with instcombine, because of two things:<br>

<br>

* loadpre may merge the original load in a shrinking pattern with other load in a very early position, and there maybe other stores in between. The safety check for the codegenprepare now simply scan mayWriteToMemory insns between load and store in the candidate shrinking pattern, and it will fail because of those stores in between.<br>

</blockquote>

<br></span>

It would be easy to make the check a lot more powerful using MemorySSA; I guess you don't have access to that in the passes where you're doing the transform?</blockquote><div><br></div></span><div>Yes, if MemorySSA + alias query can be used, it should solve the problem. I didn't find existing usage of MemorySSA in instcombine or CGP so I am not sure whether it can be used. </div><span class=""><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="m_-8464126786741311821gmail-"><br>

<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

* we may see the following pattern. a.f1 is a bitfield. Ideally, we should do the shrinking for all the three stores. But codegenprepare works in top-down order. After the first store is shrinked, it is hard for the second and the third to know %bf.set is the same as the value loaded from %0. If we do this in instcombine, it is much easier because before load/store elimination, %bf.set will be loaded from %0 respecitively in if.then9 and in if.end41.<br>

</blockquote>

<br></span>

I'm not following how it helps to do this in instcombine?  Normally, EarlyCSE runs first.  (A complete C testcase might be useful here.)</blockquote><div><br></div></span><div>It is not earlycse. It is also loadpre to cause the load to be replaced by %bf.set. </div><div><br></div><div><div style="font-size:12.8px">Before global value numbering:</div><span class=""><div style="font-size:12.8px"><br class="m_-8464126786741311821gmail-Apple-interchange-newline">entry:</div><div style="font-size:12.8px">  %0 = bitcast %class.B* %this to i64*</div><div style="font-size:12.8px">  %bf.load = load i64, i64* %0, align 8</div><div style="font-size:12.8px"><div>  %dec = add i64 %bf.load, 65535</div><div>  %bf.value = and i64 %dec, 65535</div><div>  %bf.clear5 = and i64 %bf.load, -65536</div><div>  %bf.set = or i64 %bf.value, %bf.clear5</div><div>  store i64 %bf.set, i64* %0, align 8</div></div><div style="font-size:12.8px">  ...</div><div style="font-size:12.8px">if.then9:</div></span><div style="font-size:12.8px">  <span style="font-size:12.8px">%bf.load1 = load i64, i64* %0, align 8                      // %bf.load1 is helpful for the patch to recognize the load-and-or-store pattern.</span></div><div style="font-size:12.8px"><div>  %inc79 = add i64 %bf.load1, 281474976710656       </div><span class=""><div>  %bf.shl = and i64 %inc79, 71776119061217280</div></span><div>  %bf.clear18 = and i64 %bf.load1, -71776119061217281</div><span class=""><div>  %bf.set19 = or i64 %bf.shl, %bf.clear18</div><div>  store i64 %bf.set19, i64* %0, align 8</div></span></div><div style="font-size:12.8px">  ...</div><div style="font-size:12.8px">if.end41:</div><div style="font-size:12.8px">  <span style="font-size:12.8px">%bf.load2 = load i64, i64* %0, align 8                      </span><span style="font-size:12.8px">// %bf.load2 is helpful for the patch to recognize the load-and-or-store pattern.</span></div><div style="font-size:12.8px"><div>  %inc4578 = add i64 %bf.load2, 65536                       </div><span class=""><div>  %bf.shl48 = and i64 %inc4578, 4294901760</div></span><div>  %bf.clear49 = and i64 %bf.load2, -4294901761</div><span class=""><div>  %bf.set50 = or i64 %bf.shl48, %bf.clear49</div><div>  store i64 %bf.set50, i64* %0, align 8</div></span></div></div><div> </div><div>After global value numbering:</div><span class=""><div><br></div><div><div style="font-size:12.8px">entry:</div><div style="font-size:12.8px">  %0 = bitcast %class.B* %this to i64*</div><div style="font-size:12.8px">  %bf.load = load i64, i64* %0, align 8</div><div style="font-size:12.8px"><div>  %dec = add i64 %bf.load, 65535</div><div>  %bf.value = and i64 %dec, 65535</div><div>  %bf.clear5 = and i64 %bf.load, -65536</div><div>  %bf.set = or i64 %bf.value, %bf.clear5</div><div>  store i64 %bf.set, i64* %0, align 8</div></div><div style="font-size:12.8px">  ...</div><div style="font-size:12.8px">if.then9:</div><div style="font-size:12.8px"><div>  %inc79 = add i64 %bf.set, 281474976710656       // we hope to know %bf.set is the same as load i64, i64* %0, align 8.</div><div>  %bf.shl = and i64 %inc79, 71776119061217280</div><div>  %bf.clear18 = and i64 %bf.set, -71776119061217281</div><div>  %bf.set19 = or i64 %bf.shl, %bf.clear18</div><div>  store i64 %bf.set19, i64* %0, align 8</div></div><div style="font-size:12.8px">  ...</div><div style="font-size:12.8px">if.end41:</div><div style="font-size:12.8px"><div>  %inc4578 = add i64 %bf.set, 65536                        // we hope to know %bf.set is the same as load i64, i64* %0, align 8.</div><div>  %bf.shl48 = and i64 %inc4578, 4294901760</div><div>  %bf.clear49 = and i64 %bf.set, -4294901761</div><div>  %bf.set50 = or i64 %bf.shl48, %bf.clear49</div><div>  store i64 %bf.set50, i64* %0, align 8</div><div><br></div></div></div></span><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="m_-8464126786741311821gmail-HOEnZb"><font color="#888888"><br>

<br><span class="">

-Eli<br>

<br>

-- <br>

Employee of Qualcomm Innovation Center, Inc.<br>

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Projec<br></span></font></span></blockquote></div></div></div></blockquote><div><br></div><div> Or can I move all the types of shrinking to instcombine? Since I only do the shrinking to i8, i16, i32, I think almost all the general architectures support i8/i16/i32 load/store directly?</div><div> </div></div><br></div></div>