<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Aug 12, 2015 at 6:33 PM, Sean Silva <span dir="ltr"><<a href="mailto:chisophugis@gmail.com" target="_blank">chisophugis@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br><div dir="ltr"><span class=""></span>For reference, `mov [mem],imm` is decoded into 2 micro-ops (see "Table 1. Typical Instruction Mappings" in [SOG]) whereas `mov [mem],reg` is only 1 micro-op, so it is *preferable* to use a reg since it amortizes the cost of the `mov-imm` micro-op across the stores.</div></blockquote><div><br><br></div><div>Wow, I never noticed that line in the table. So whatever we do may have to be specialized further by micro-arch...<br><br></div><div>But the Intel Perf guide has this gem at Rule 39:<br>"Try to schedule μops that have no immediate immediately before or after μops with 32-bit immediates."</div><div><br></div><div> ...so maybe it's a no-brainer for everyone after all. :)<br></div></div><br></div></div>