<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On 19 December 2013 08:50, suyog sarda <span dir="ltr"><<a href="mailto:sardask01@gmail.com" target="_blank">sardask01@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div class="gmail_extra"><div class="gmail_quote"><div>It may seem that total number of cycles are more or less same for single vmla and vmul+vadd. However, when vmul+vadd combination is used instead of vmla, then intermediate results will be generated which needs to be stored in memory for future access. This will lead to lot of load/store ops being inserted which degrade performance. Correct me if i am wrong on this, but my observation till date have shown this. <br>
</div></div></div></div></div></blockquote><div></div></div><br></div><div class="gmail_extra">VMLA.F can be either NEON or VFP on A series and the encoding will determine which will be used. In assembly files, the difference is mainly the type vs. the registers used.</div>
<div class="gmail_extra"><br></div><div class="gmail_extra">The problem we were trying to avoid a long time ago was well researched by Evan Cheng and it has shown that there is a pipeline stall between two sequential VMLAs (possibly due to the need of re-use of some registers) and this made code much slower than a sequence of VMLA+VMUL+VADD.</div>
<div class="gmail_extra"><br></div><div class="gmail_extra">Also, please note that, as accurate as cycle counts go, according to the A9 manual, one VFP VMLA takes almost as long as a pair of VMUL+VADD to provide the results, so a sequence of VMUL+VADD might be faster, in some contexts or cores, than half the sequence of VMLAs.</div>
<div class="gmail_extra"><br></div><div class="gmail_extra">As Tim and David said and I agree, without hard data, anything we say might be used against us. ;)</div><div class="gmail_extra"><br></div><div class="gmail_extra">
cheers,</div><div class="gmail_extra">--renato</div></div>