<div dir="ltr"><br><div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, Nov 25, 2013 at 4:48 PM, Jim Grosbach <span dir="ltr"><<a href="mailto:grosbach@apple.com" target="_blank">grosbach@apple.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><br><div><div>A few select examples I’m seeing are: 256.bzip2 improves by 7%. 401.bzip2 improves by 4.5%. 300.twolf improves by 3%. 186.crafty improves by 4%. The details vary, but this is true for both Ivy Bridge and Haswell in particular.</div>

<div><br></div></div></div></blockquote><div><br></div><div>Hmm... on second thought, do these programs use lots of i16's? Agner reports that on Ivy Bridge and Haswell there is no partial register access cost for the i8 low subregs. He doesn't seem to mention anything about 16-bit, so I assume that the partial register stall is still there for the i16 subregs??? I don't have an Ivy Bridge or Haswell to test on unfortunately :(</div>

<div><br></div><div>-- Sean Silva </div></div></div></div>