<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Aug 13, 2015 at 11:25 AM, Sanjay Patel <span dir="ltr"><<a href="mailto:spatel@rotateright.com" target="_blank">spatel@rotateright.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>Filed as:<br><a href="https://llvm.org/bugs/show_bug.cgi?id=24447" target="_blank">https://llvm.org/bugs/show_bug.cgi?id=24447</a><br><a href="https://llvm.org/bugs/show_bug.cgi?id=24448" target="_blank">https://llvm.org/bugs/show_bug.cgi?id=24448</a><br><a href="https://llvm.org/bugs/show_bug.cgi?id=24449" target="_blank">https://llvm.org/bugs/show_bug.cgi?id=24449</a><br><br></div>The last one looks like the easiest one to solve and probably offers the most upside given that you're seeing mostly zeros being stored.<br></div></blockquote><div><br></div><div>Thanks for filing those bugs and looking into this!</div><div><br></div><div>-- Sean Silva</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><br></div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Aug 13, 2015 at 9:21 AM, Sanjay Patel <span dir="ltr"><<a href="mailto:spatel@rotateright.com" target="_blank">spatel@rotateright.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><span>On Wed, Aug 12, 2015 at 6:33 PM, Sean Silva <span dir="ltr"><<a href="mailto:chisophugis@gmail.com" target="_blank">chisophugis@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br><div dir="ltr"><span></span>For reference, `mov [mem],imm` is decoded into 2 micro-ops (see "Table 1. Typical Instruction Mappings" in [SOG]) whereas `mov [mem],reg` is only 1 micro-op, so it is *preferable* to use a reg since it amortizes the cost of the `mov-imm` micro-op across the stores.</div></blockquote><div><br><br></div></span><div>Wow, I never noticed that line in the table. So whatever we do may have to be specialized further by micro-arch...<br><br></div><div>But the Intel Perf guide has this gem at Rule 39:<br>"Try to schedule μops that have no immediate immediately before or after μops with 32-bit immediates."</div><div><br></div><div> ...so maybe it's a no-brainer for everyone after all. :)<br></div></div><br></div></div>

</blockquote></div><br></div>

</div></div></blockquote></div><br></div></div>