<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On 19 December 2013 13:30, suyog sarda <span dir="ltr"><<a href="mailto:sardask01@gmail.com" target="_blank">sardask01@gmail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">

<div>I tested it on A15, i don't have access to A8 rightnow, but i intend to test it for A8 as well. I compiled the code for A8 and as it was working fine on A15 without any crash, i went ahead with cortex-a8 option. I don't think i will get A8 hardware soon, can someone please check it on A8 hardware as well (Sorry for the trouble)?  <br>

</div></div></div></div></blockquote><div><br></div><div>It's not surprising that -mcpu=cortex-a15 option performs better on an A15 than -mcpu=cortex-a8. It's also not surprising that you don't see the VMLA hazard we're talking about, since that was (if I recall correctly) specific to A8 (maybe A9, too).</div>

<div><br></div><div>We can only talk about disabling the VMLX-fwd feature from A8 when substantial benchmarks are done on a Cortex-A8. Not number of instructions, but performance. Emitting more VMLAs doesn't mean it'll go faster, as what we found in some cases, actually, is quite the opposite.</div>

<div><br></div><div>In the meantime, if you're using an A15, just use -mcpu=cortex-a15 and hopefully, the code generated will be as fast as possible.</div><div><br></div><div>Having Clang detect that you have an A15 automatically is another topic that we could descend, but it has nothing to do with VMLA.</div>

<div><br></div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr">

<div class="gmail_extra"><div class="gmail_quote"><div>Ok. I couldn't find reference for this. If the pipeline stall issue was fixed in cortex-a15 then LLVM developers will definitely know about this (and hence we are emitting vmla for cortex-a15). I couldn't find any comment related to this in the code. Can someone please point it out? Also, I will be glad to know the code place where we start differentiating between cortex-a8 and cortex-a15 for code generation.<br>

</div></div></div></div></blockquote><div><br></div><div>The link below shows some fragments of the thread (I hate gmane), but shows Evan's benchmarks and assumptions.</div><div> </div><div><a href="http://comments.gmane.org/gmane.comp.compilers.llvm.devel/59458">http://comments.gmane.org/gmane.comp.compilers.llvm.devel/59458</a><br>

</div><div><br></div><div>cheers,</div><div>--renato</div></div></div></div>