<div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Sep 2, 2014 at 12:11 AM, Hao Liu <span dir="ltr"><<a href="mailto:Hao.Liu@arm.com" target="_blank">Hao.Liu@arm.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Hi James & Chandler,<u></u><u></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">I have two small test cases to show James’ first concern. Test results show loop vectorizor generates quite poor code for wide store. To see the result by following command lines:<u></u><u></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">opt –S –loop-vectorize < wide-store.ll<u></u><u></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">opt –S –loop-vectorize < narrow-stores.ll<u></u><u></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">The wide-store.ll and narrow-stores.ll are generated from attached struct.cpp by with or without the my patch. This cpp case is simplified from a hot function in SPEC CPU 2006 473.astar. Currently the poor code affects the performance.<u></u><u></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Hi Chandler,<u></u><u></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">I also agree with your concern. On the other hand, If the input is zext/shl/or and a wide store, the patch in SROA can not handle such case. For example, if the input is wide-store.ll, only a separate pass or function specific to handle such case can generate simpler code.<u></u><u></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">But there is a conflict, even though we add code in the backend, we still can’t solve the problem about the wide-store affecting the Loop Vectorization issue. For this concern, I think maybe we prefer narrow stores than wide store.</span></p></blockquote></div><br>Before I dig into trying to explain various ways it is or isn't possible to generate better code with the wide stores, I think it is really important to understand why you aren't concerned about the memory model implications here which cause us to *lose information* in the IR when splitting stores. Once fundamental invariants of the program are lost, they simply cannot be recovered. This seems to me to be the overriding concern. The fact that we need to improve lots of other parts of LLVM -- well, yes, we need to do lots of improvements to LLVM.</div><div class="gmail_extra"><br></div><div class="gmail_extra">And none of these improvements seem bad. A user could just as easily have written this kind of wide store in their code, and we will fail to optimize it in all the ways you outline. No changes to SROA will fix this. We can only emit efficient code when the *user* provides a wide store by actually teaching the optimizer to analyze and emit efficient code for it. Once we do that, we have also solved the "problem" for SROA.</div></div>