<html><head><meta http-equiv="Content-Type" content="text/html charset=iso-8859-1"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div>X86 cost model already knows that YMM loads take 2 cycles.  :)</div><br><div><div>On Jan 18, 2013, at 4:46 PM, Renato Golin Linaro <<a href="mailto:renato.golin@linaro.org">renato.golin@linaro.org</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div dir="ltr">On 18 January 2013 23:10, Nadav Rotem <span dir="ltr"><<a href="mailto:nrotem@apple.com" target="_blank">nrotem@apple.com</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Author: nadav<br>

Date: Fri Jan 18 17:10:30 2013<br>

New Revision: 172868<br>

<br>

URL: <a href="http://llvm.org/viewvc/llvm-project?rev=172868&view=rev" target="_blank">http://llvm.org/viewvc/llvm-project?rev=172868&view=rev</a><br>

Log:<br>

On Sandybridge loading unaligned 256bits using two XMM loads (vmovups and vinsertf128) is faster than using a single vmovups instruction.<br></blockquote><div><br></div><div style="">Would the cost model need to be updated?</div>

<div style=""><br></div><div style="">I'm just asking to understand better where the cost model fits in all things... ;)</div><div style=""><br></div><div style="">cheers,</div><div style="">--renato</div></div></div></div>

</blockquote></div><br></body></html>