<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body dir="auto"><div>I has a compound effect on throughput if you can only issue three a cycle. So there will be an impact on four vs  one of them. This is what I am trying to capture at a high Level.<br><br>Sent from my iPhone</div><div><br>On Feb 6, 2013, at 11:55 PM, Nadav Rotem <<a href="mailto:nrotem@apple.com">nrotem@apple.com</a>> wrote:<br><br></div><blockquote type="cite"><div><meta http-equiv="Content-Type" content="text/html charset=iso-8859-1"><div>I am not sure that its worth modeling this because it only affects the latency and not the throughput of the machine. </div><div><br></div><div><div><div>On Feb 5, 2013, at 3:33 PM, Arnold Schwaighofer <<a href="mailto:aschwaighofer@apple.com">aschwaighofer@apple.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><meta http-equiv="Content-Type" content="text/html charset=iso-8859-1"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Updated patch. We now add the cost of address computation as part of the memory instruction cost.<div><br><div><br></div><div>Thanks</div><div><br></div><div></div></div></div><span><0001-ARM-cost-model-Address-computation-in-vector-mem-ops.patch></span><meta http-equiv="Content-Type" content="text/html charset=iso-8859-1"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><br><div><div>On Feb 1, 2013, at 1:39 PM, Renato Golin <<a href="mailto:renato.golin@linaro.org">renato.golin@linaro.org</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div dir="ltr">On 1 February 2013 18:07, Nadav Rotem <span dir="ltr"><<a href="mailto:nrotem@apple.com" target="_blank">nrotem@apple.com</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

The problem is that we decide on the kind of GEP to use only when we vectorize the load/stores.  This happens in vectorizeMemoryInstruction. I think that we need to fix this in LoopVectorizationCostModel::getInstructionCost in the load/store switch cases. We have code for checking if the load/store is wide or if it is scalarized.<br>

</blockquote><div><br></div><div style="">Good point! Shouldn't be too hard, though.</div><div style=""><br></div><div style="">cheers,<br></div><div style="">--renato</div></div></div></div>

</blockquote></div><br></div></blockquote></div><br></div></div></blockquote></body></html>